For SQL sources and targets, SAP BusinessObjects Data Services creates database-specific SQL
statements based on the data flow diagrams in a job. The software generates SQL SELECT statements
to retrieve the data from source databases. To optimize performance, the software pushes down as
many SELECT operations as possible to the source database and combines as many operations as
possible into one request to the database. It can push down SELECT operations such as joins, Group
By, and common functions such as decode and string functions.
Data flow design influences the number of operations that the software can push to the database. Before
running a job, you can view the SQL that is generated and adjust your design to maximize the SQL
that is pushed down to improve performance.
You can use database links and the Data_Transfer transform to pushdown more operations.
Push-down operations
By pushing down operations to the source database, Data Services reduces the number of rows and
operations that the engine must retrieve and process, which improves performance. When determining
which operations to push to the database, Data Services examines the database and its environment.
Full push-down operations
The Optimizer always first tries to do a full push-down operation. A full push-down operation is when
all transform operations can be pushed down to the databases and the data streams directly from the
source database to the target database. SAP BusinessObjects Data Services sends SQL INSERT INTO
SELECT statements to the target database where SELECT retrieves data from the source.
The software does a full push-down operation to the source and target databases when the following
conditions are met:
- All of the operations between the source table and target table can be pushed down.
- The source and target tables are from the same datastore or they are in datastores that have a database link defined between them.
To enable a full push-down from the source to the target, you can also use the following features:
- Data_Transfer transform
- Database links
For database targets that support the Allow Mergeoption, when all other operations in the data flow
can be pushed down to the source database, the auto-correct loading operation may also be pushed
down for a full push-down operation to the target. The software sends an SQL MERGE INTO target
statement that implements the Ignore columns with value and Ignore columns with nulloptions.
Partial push-down operations
When a full push-down operation is not possible, SAP BusinessObjects Data Services still pushes down
the SELECT statement to the source database. Operations within the SELECT statement that the
software can push to the database include:
- Aggregations — Aggregate functions, typically used with a Group by statement, always produce a data set smaller than or the same size as the original data set.
- Distinct rows — When you select Distinct rows from the Select tab inthe query editor, the software will only output unique rows.
- Filtering — Filtering can produce a data set smaller than or equal to the original data set.
- Joins — Joins typically produce a data set smaller than or similar in size to the original tables. The software can push down joins when either of the following conditions exist:
- The source tables are in the same datastore
- The source tables are in datastores that have a database link defined between them
- Ordering — Ordering does not affect data-set size. The software can efficiently sort data sets that fit in memory. It is recommended that you push down the Order By for very large data sets.
- Projection — Projection is the subset of columns that you map on the Mappingtab in the query editor. Projection normally produces a smaller data set because it only returns columns needed by subsequent operations in a data flow.
- Functions — Most functions that have equivalents in the underlying database are appropriately translated. These functions include decode, aggregation, and string functions.
Operations that cannot be pushed down
SAP BusinessObjects Data Services cannot push some transform operations to the database. For
example:
- Expressions that include functions that do not have database correspondents
- Load operations that contain triggers
- Transforms other than Query
- Joins between sources that are on different database servers that do not have database links defined between them.
Similarly, the software cannot always combine operations into single requests. For example, when a
stored procedure contains a COMMIT statement or does not return a value, the software cannot combine
the stored procedure SQL with the SQL for other operations in a query.
The software can only push operations supported by the DBMS down to that DBMS. Therefore, for
best performance, try not to intersperse SAP BusinessObjects Data Services transforms among
operations that can be pushed down to the database.
Collapsing transforms to push down operations example
When determining how to push operations to the database, SAP BusinessObjects Data Services first
collapses all the transforms into the minimum set of transformations expressed in terms of the source
table columns. Next, the software pushes all possible operations on tables of the same database down
to that DBMS.
For example, the following data flow extracts rows from a single source table.
![1.PNG]()
The first query selects only the rows in the source where column A contains a value greater than 100.
The second query refines the extraction further, reducing the number of columns returned and further
reducing the qualifying rows.
The software collapses the two queries into a single command for the DBMS to execute.
The following command uses AND to combine the WHERE clauses from the two queries:
SELECT A, MAX(B), C
FROM source
WHERE A > 100 AND B = C
GROUP BY A, C
The software can push down all the operations in this SELECT statement to the source DBMS.
Full push down from the source to the target example
If the source and target are in the same datastore, the software can do a full push-down operation
where the INSERT into the target uses a SELECT from the source. In the sample data flow in scenario
1. a full push down passes the following statement to the database:
INSERT INTO target (A, B, C)
SELECT A, MAX(B), C
FROM source
WHERE A > 100 AND B = C
GROUP BY A, C
If the source and target are not in the same datastore, the software can also do a full push-down
operation if you use one of the following features:
- Add a Data _Transfer transform before the target.
- Define a database link between the two datastores.
Full push down for auto correct load to the target example
For supported databases, if you enable the Auto correct loadand Allow Mergeoptions, the Optimizer
may be able to do a full push-down operation where the SQL statement is a MERGE into the target
with a SELECT from the source.
In order for the Allow Mergeoption to generate a MERGE statement, the primary key of the source
table must be a subset of the primary key of the target table and the source row must be unique on the
target key. In other words, there cannot be duplicate rows in the source data. If this condition is not
met, the Optimizer pushes down the operation using a database-specific method to identify, update,
and insert rows into the target table.
For example, suppose you have a data flow where the source and target tables are in the same datastore
and the Auto correct loadand Allow Mergeoptions are set to Yes.
The push-down operation passes the following statement to an Oracle database:
MERGE INTO "ODS"."TARGET" s
USING
SELECT "SOURCE"."A" A , "SOURCE"."B" B , "SOURCE"."C" C
FROM "ODS"."SOURCE" "SOURCE"
) n
ON ((s.A = n.A))
WHEN MATCHED THEN
UPDATE SET s."B" = n.B, s."C" = n.C
WHEN NOT MATCHED THEN
INSERT /*+ APPEND */ (s."A", s."B", s."C" )
VALUES (n.A , n.B , n.C)
Similar statements are used for other supported databases.
Partial push down to the source example
If the data flow contains operations that cannot be passed to the DBMS, the software optimizes the
transformation differently than the previous two scenarios. For example, if Query1 called func(A) >
100, where func is a SAP BusinessObjects Data Services custom function, then the software generates
two commands:
- The first query becomes the following command which the source DBMS executes:
SELECT A, B, C
FROM source
WHERE B = C
- The second query becomes the following command which SAP BusinessObjects Data Services
executes because func cannot be pushed to the database:
SELECT A, MAX(B), C
FROM Query1
WHERE func(A) > 100
GROUP BY A, C
Push-down of SQL join example
If the tables to be joined in a query meet the requirements for a push-down operation, then the entire
query is pushed down to the DBMS.
To confirm that the query will be pushed down, look at the Optimized SQL. If the query shows a single
SELECT statement, then it will be pushed down.
For example, in the data flow shown below, the Department and Employee tables are joined with a
inner join and then the result of that join is joined with left outer join to the Bonus table.
![2.PNG]()
The resulting Optimized SQL contains a single select statement and the entire query is pushed down
to the DBMS:
SELECT DEPARTMENT.DEPTID, DEPARTMENT.DEPARTMENT, EMPLOYEE.LASTNAME, BONUS.BONUS
FROM (DEPARTMENT INNER JOIN EMPLOYEE
(ON DEPARTMENT.DEPTID=EMPLOYEE.DEPTID))
LEFT OUTER JOIN BONUS
ON (EMPLOYEE.EMPID = BONUS.EMPID)
To view SQL
Before running a job, you can view the SQL code that SAP BusinessObjects Data Services generates
for table sources in data flows. By examining the SQL code, you can verify that the software generates
the commands you expect. If necessary, you can alter your design to improve the data flow.
1. Validate and save data flows.
2. Open a data flow in the workspace.
3. Select Display Optimized SQL from the Validation menu.
Alternately, you can right-click a data flow in the object library and select Display Optimized SQL.
The "Optimized SQL" window opens and shows a list of datastores and the optimized SQL code for
the selected datastore. By default, the "Optimized SQL" window selects the first datastore.
The software only shows the SELECT generated for table sources and INSERT INTO... SELECT
for targets. It does not show the SQL generated for SQL sources that are not table sources, such
as:
- Lookup function
- Key_generation function
- Key_Generation transform
- Table_Comparison transform
4. Select a name from the list of datastores on the left to view the SQL that this data flow applies against
the corresponding database or application.
The following example shows the optimized SQL for the second datastore which illustrates a full
push-down operation (INSERT INTO... SELECT). This data flows uses a Data_Transfer transform
to create a transfer table that the software loads directly into the target.
INSERT INTO "DBO"."ORDER_AGG" ("SHIPCOUNTRY","SHIPREGION", "SALES_AGG")
SELECT "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION" ,sum("TS_Query_Lookup"."SALES")
FROM"DBO"."TRANS2""TS_Query_Lookup"
GROUP BY "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION"
In the "Optimized SQL" window you can:
- Use the Find button to perform a search on the SQL displayed.
- Use the Save As button to save the text as a .sql file.
If you try to use the Display Optimized SQL command when there are no SQL sources in your data
flow, the software alerts you. Examples of non-SQL sources include:
• Message sources
• File sources
• IDoc sources
If a data flow is not valid when you click the Display Optimized SQLoption, the software alerts you.
Data_Transfer transform for push-down operations
Use the Data_Transfer transform to move data from a source or from another transform into the target
datastore and enable a full push-down operation (INSERT INTO... SELECT) to the target. You can use
the Data_Transfer transform to push-down resource-intensive operations that occur anywhere within
a data flow to the database. Resource-intensive operations include joins, GROUP BY, ORDER BY,
and DISTINCT.
Push down an operation after a blocking operation example
You can place a Data_Transfer transform after a blocking operation to enable Data Services to push
down a subsequent operation. A blocking operation is an operation that the software cannot push down
to the database, and prevents ("blocks") operations after it from being pushed down.
For example, you might have a data flow that groups sales order records by country and region, and
sums the sales amounts to find which regions are generating the most revenue. The following diagram
shows that the data flow contains a Pivot transform to obtain orders by Customer ID, a Query transform
that contains a lookup_ext function to obtain sales subtotals, and another Query transform to group the
results by country and region.
![3.PNG]()
Because the Pivot transform and the lookup_ext function are before the query with the GROUP BY
clause, the software cannot push down the GROUP BY operation. Here is how the "Optimized SQL"
window would show the SELECT statement that the software pushes down to the source database:
SELECT "ORDERID", "CUSTOMERID", "EMPLOYEEID", "ORDERDATE", "REQUIREDDATE", "SHIPPEDDATE",, "SHIPVIA"
"FREIGHT", "SHIPNAME", "SHIPADDRESS", "SHIPCITY", "SHIPREGION", "SHIPPOSTALCODE", "SHIPCOUNTRY"
FROM "DBO"."ORDERS"
However, if you add a Data_Transfer transform before the second Query transform and specify a transfer
table in the same datastore as the target table, the software can push down the GROUP BY operation.
![4.PNG]()
The Data_Transfer Editor window shows that the transfer type is Table and the transfer table is in the
same datastore as the target table (Northwind_DS.DBO.TRANS2).
Here's how the "Optimized SQL" window would show that the software pushed down the GROUP BY
to the transfer table TRANS2.
INSTER INTO "DBO"."ORDER_AGG" ("SHIPCOUTNRY", "SHIPREGION", "SALES_AGG")
SELECT "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION" , sum("TS_Query_Lookup"."SALES")
FROM "DBO"."TRANS2""TS_Query_Lookup"
GROUP BY "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION"
Using Data_Transfer tables to speed up auto correct loads example
Auto correct loading ensures that the same row is not duplicated in a target table, which is useful for
data recovery operations. However, an auto correct load prevents a full push-down operation from the
source to the target when the source and target are in different datastores.
For large loads using database targets that support the Allow Merge option for auto correct load, you
can add a Data_Transfer transform before the target to enable a full push-down from the source to the
- target. In order for the Allow Merge option to generate a MERGE statement:
- the primary key of the source table must be a subset of the primary key of the target table
- the source row must be unique on the target key
In other words, there cannot be duplicate rows in the source data. If this condition is not met, the
Optimizer pushes down the operation using a database-specific method to identify, update, and insert
rows into the target table.
If the MERGE statement can be used, SAP BusinessObjects Data Services generates an SQL MERGE
INTO target statement that implements the Ignore columns with value value (if a value is specified
in the target transform editor) and the Ignore columns with null Yes/No setting.
For example, suppose you create a data flow that loads sales orders into an Oracle target table which
is in a different datastore from the source.
For this data flow, the Auto correct load option is active and set to Yes, and the Ignore columns with
null and Allow merge options are also active.
The SELECT statement that the software pushes down to the source database would look like the
following (as it would appear in the "Optimized SQL" window).
SELECT "ODS_SALESORDER"."SALES_ORDER_NUMBER" , "ODS_SALESORDER"."ORDER_DATE" , "ODS_SALESORDER"."CUST_ID"
FROM "ODS"."ODS_SALESORDER""ODS_SALESORDER"
When you add a Data_Transfer transform before the target and specify a transfer table in the same
datastore as the target, the software can push down the auto correct load operation.
The following MERGE statement is what the software pushes down to the Oracle target (as it appears
in the "Optimized SQL" window).
MERGE INTO "TARGET"."AUTO_CORRECT_LOAD2_TARGET" s
USING
(SELECT "AUTOLOADTRANSFER"."SALES_ORDER_NUMBER" SALES_ORDER_NUMBER,
"AUTOLOADTRANSFER"."ORDER_DATE" ORDER_DATE, "AUTOLOADTRANSFER"."CUST_ID" CUST_ID
FROM "TARGET"."AUTOLOADTRANSFER" "AUTOLOADTRANSFER") n
ON ((s.SALES_ORDER_NUMBER=n.SALES_ORDRE_NUMBER00
WHEN MATCHED THEN
UPDATE SET s."ORDER_DATE"=nvl(n.ORDER_DATE,s."ORDER_DATE"), s."CUST_ID"=nbl(n.CUST_ID,s."CUST_ID"
WHEN NOT MATCHED THEN
INSERT(s."SALES_ORDER_NUMBER",s."ORDER_DATE",s."CUST_ID")
VALUES(n.SALES_ORDRE_NUMBER,n.ORDRE_DATE,n.CUSTID)
Database link support for push-down operations across datastores
Various database vendors support one-way communication paths from one database server to another.
SAP BusinessObjects Data Services refers to communication paths between databases as database
links. The datastores in a database link relationship are called linked datastores.
The software uses linked datastores to enhance its performance by pushing down operations to a target
database using a target datastore. Pushing down operations to a database not only reduces the amount
of information that needs to be transferred between the databases and SAP BusinessObjects Data
Services but also allows the software to take advantage of the various DMBS capabilities, such as
various join algorithms.
With support for database links, the software pushes processing down from different datastores, which
can also refer to the same or different database type. Linked datastores allow a one-way path for data.
For example, if you import a database link from target database B and link datastore B to datastore A,
the software pushes the load operation down to database B, not to database A.
Software support
SAP BusinessObjects Data Services supports push-down operations using linked datastores on all
Windows and Unix platforms. It supports DB2, Oracle, and MS SQL server databases.
To take advantage of linked datastores
1. Create a database link on a database server that you intend to use as a target in a job.
The following database software is required. See the Supported Platforms document for specific version numbers.
- For DB2, use the DB2 Information Services (previously known as Relational Connect) software and make sure that the database user has privileges to create and drop a nickname.
To end users and client applications, data sources appear as a single collective database in DB2. Users and applications interface with the database managed by the information server. Therefore, configure an information server and then add the external data sources. DB2 uses nicknames to identify remote tables and views.
- For Oracle, use the Transparent Gateway for DB2 and MS SQL Server.
See the Oracle database manuals for more information about how to create database links for Oracle and non-Oracle servers.
- For MS SQL Server, no special software is required.
Microsoft SQL Server supports access to distributed data stored in multiple instances of SQL Server and heterogeneous data stored in various relational and non-relational data sources using an OLE database provider. SQL Server supports access to distributed or heterogeneous database sources in Transact-SQL statements by qualifying the data sources with the names of the linked server where the data sources exist.
2. Create a database datastore connection to your target database.
Generated SQL statements
To see how SAP BusinessObjects Data Services optimizes SQL statements, use Display Optimized
SQL from the Validation menu when a data flow is open in the workspace.
- For DB2, it uses nicknames to refer to remote table references in the SQL display.
- For Oracle, it uses the following syntax to refer to remote table references: <remote_table>@<dblink_name>.
- For SQL Server, it uses the following syntax to refer to remote table references: <liked_server>.<remote_database >.<remote_user >.<remote_table>.
Tuning performance at the data flow or Job Server level
You might want to turn off linked-datastore push downs in cases where you do not notice performance
improvements.
For example, the underlying database might not process operations from different data sources well.
Data Services pushes down Oracle stored procedures and external functions. If these are in a job that
uses database links, it will not impact expected performance gains. However, Data Services does not
push down functions imported from other databases (such as DB2). In this case, although you may be
using database links, Data Services cannot push the processing down.
Test your assumptions about individual job designs before committing to a large development effort
using database links.
For a data flow
On the data flow properties dialog, this product enables the Use database links option by default to
allow push-down operations using linked datastores. If you do not want to use linked datastores in a
data flow to push down processing, deselect the check box.
This product can perform push downs using datastore links if the tables involved share the same
database type and database connection name, or datasource name, even if the tables have different
schema names. However, problems with enabling this feature could arise, for example, if the user of
one datastore does not have access privileges to the tables of another datastore, causing a data access
problem. In such a case, you can disable this feature.
For a Job Server
You can also disable linked datastores at the Job Server level. However, the Use database links
option, at the data flow level, takes precedence.