Quantcast
Channel: Data Services and Data Quality
Viewing all 236 articles
Browse latest View live

SAP DS Load data into Google Big Query

$
0
0

Intro

 

In my post SAP Data Services 4.2 SP4 New Features I highlighted some new functionality introduced in Data Services 4.2 SP04. One of the new functionality was the ability for Data Services to load data into Google Big Query. This blog focuses on how you can do that.

 

 

 

Google Big Query Setup

 

 

To get started with Google Big Query you need to

bigQuery.jpg

 

 

 

  • You will then be taken to project overview page. You will see option "Try BigQuery". Click on it.

BigQueryProj.png

 

 

  • You will be taken to a page where you can compose a query. There is sample tables that you can query. As seen below a query has been created against one of the sample tables. Once run it shows the time and mb processed.

GoogleBigQuerySample.jpg

 

 

  • We now need to also sort out authentication. Navigate here

BigQueryAuth.jpg

 

 

 

  • Select the following options. When you click "Create Client ID" it will create an authentication file. Save the file, this is in json format, this is not the file we need. If using chrome it will save in downloads automatically.

 

BigQueryCreateClient.jpg

 

 

  • Also on the credentials screen  you will have more info as seen below. Click "Generate new p12 Key", save this file as we need this when creating the datastore in DS. When you generate the key a popup will been show and will have a pass phrase. Write this pass phrase down. We need this to create the datastore in DS also.

 

GoogleBigQueryCert.jpg

 

 

 

  • You will also need to setup the trial, you cannot create a table without a trial. DS can only load so we need a table to be created.

BigQueryTrial.jpg

 

 

 

You now have everything you need to create a datastore in data services.

 

 

 

 

Data Services Google Big Query Datastore

 

 

You can now create a datastore in Data Services Designer. The only pars you need to fill in is the parts in the red. All the other parts are there by default. The fields you need to enter come from the Google setup part we did above.

BigQueryDataStore.jpg

 

You can now load data to Google Big Query.

 

 

Hope this helps.


Mask Functionality in DS 4.2 SP 04

$
0
0

Intro

 

In my post SAP Data Services 4.2 SP4 New Features I highlighted some new functionality introduced in Data Services 4.2 SP04. One of the new functionality was the ability for Data Services to mask data. This blog focuses on how you can mask data.

 

The Data Mask transform enables you to protect personally identifiable information in your data. Personal information includes data such as credit card numbers, salary information, birth dates, personal identification numbers, or bank account numbers. You may want to use data masking to support security and privacy policies, and to protect your customer or employee information from possible theft or exploitation.

 

 

Data Mask Example

 

Here is an example of a very basic data flow with the mask being used.

DataMask.jpg




We then on the input tab indicate fields to mask

Mask_Input.jpg




Then fill in the options tab. For every field you want to mask you must duplicate the mask section. Then assign a field to each mask out section.

Mask_Options.jpg



Then on the output tab choose the fields you want to output

Mask_output.jpg




Here is a sample of the masking by using View Design-Time Data button.

Mask_Sample.jpg







Conclusion

 

The masking option is a nice addition. Very welcomed. It however feels a bit incomplete. Many places we use masking here in South Africa is to mask credit card numbers, we however only mask for example the middle part of the credit card number. Currently this mask only allows you to say where to start from and it will then mask the remaining characters. So I would like to see an option to stipulate where to start and how many characters to mask.


Also I noticed that when the field names have spaces you get a few errors saying the mapping is not complete. Once I removed spaces in the fields I was error free.


Hope the above helps.


Thanks.



SAP Data Services Maximizing Push-Down Operations

$
0
0

For SQL sources and targets, SAP BusinessObjects Data Services creates database-specific SQL

statements based on the data flow diagrams in a job. The software generates SQL SELECT statements

to retrieve the data from source databases. To optimize performance, the software pushes down as

many SELECT operations as possible to the source database and combines as many operations as

possible into one request to the database. It can push down SELECT operations such as joins, Group

By, and common functions such as decode and string functions.

Data flow design influences the number of operations that the software can push to the database. Before

running a job, you can view the SQL that is generated and adjust your design to maximize the SQL

that is pushed down to improve performance.

You can use database links and the Data_Transfer transform to pushdown more operations.

 

Push-down operations

By pushing down operations to the source database, Data Services reduces the number of rows and

operations that the engine must retrieve and process, which improves performance. When determining

which operations to push to the database, Data Services examines the database and its environment.

 

Full push-down operations

The Optimizer always first tries to do a full push-down operation. A full push-down operation is when

all transform operations can be pushed down to the databases and the data streams directly from the

source database to the target database. SAP BusinessObjects Data Services sends SQL INSERT INTO

SELECT statements to the target database where SELECT retrieves data from the source.

The software does a full push-down operation to the source and target databases when the following

conditions are met:

  • All of the operations between the source table and target table can be pushed down.
  • The source and target tables are from the same datastore or they are in datastores that have a database link defined between them.

 

To enable a full push-down from the source to the target, you can also use the following features:

  • Data_Transfer transform
  • Database links

For database targets that support the Allow Mergeoption, when all other operations in the data flow

can be pushed down to the source database, the auto-correct loading operation may also be pushed

down for a full push-down operation to the target. The software sends an SQL MERGE INTO target

statement that implements the Ignore columns with value and Ignore columns with nulloptions.


Partial push-down operations

When a full push-down operation is not possible, SAP BusinessObjects Data Services still pushes down

the SELECT statement to the source database. Operations within the SELECT statement that the

software can push to the database include:

  • Aggregations — Aggregate functions, typically used with a Group by statement, always produce a data set smaller than or the same size as the original data set.
  • Distinct rows — When you select Distinct rows from the Select tab inthe query editor, the software will only output unique rows.
  • Filtering — Filtering can produce a data set smaller than or equal to the original data set.
  • Joins — Joins typically produce a data set smaller than or similar in size to the original tables. The software can push down joins when either of the following conditions exist:
    • The source tables are in the same datastore
    • The source tables are in datastores that have a database link defined between them
  • Ordering — Ordering does not affect data-set size. The software can efficiently sort data sets that fit in memory. It is recommended that you push down the Order By for very large data sets.
  • Projection — Projection is the subset of columns that you map on the Mappingtab in the query editor. Projection normally produces a smaller data set because it only returns columns needed by subsequent operations in a data flow.
  • Functions — Most functions that have equivalents in the underlying database are appropriately translated. These functions include decode, aggregation, and string functions.


Operations that cannot be pushed down

SAP BusinessObjects Data Services cannot push some transform operations to the database. For

example:

  • Expressions that include functions that do not have database correspondents
  • Load operations that contain triggers
  • Transforms other than Query
  • Joins between sources that are on different database servers that do not have database links defined between them.

 

Similarly, the software cannot always combine operations into single requests. For example, when a

stored procedure contains a COMMIT statement or does not return a value, the software cannot combine

the stored procedure SQL with the SQL for other operations in a query.

 

The software can only push operations supported by the DBMS down to that DBMS. Therefore, for

best performance, try not to intersperse SAP BusinessObjects Data Services transforms among

operations that can be pushed down to the database.


Collapsing transforms to push down operations example

When determining how to push operations to the database, SAP BusinessObjects Data Services first

collapses all the transforms into the minimum set of transformations expressed in terms of the source

table columns. Next, the software pushes all possible operations on tables of the same database down

to that DBMS.

 

For example, the following data flow extracts rows from a single source table.

1.PNG

 

The first query selects only the rows in the source where column A contains a value greater than 100.

The second query refines the extraction further, reducing the number of columns returned and further

reducing the qualifying rows.

The software collapses the two queries into a single command for the DBMS to execute.

The following command uses AND to combine the WHERE clauses from the two queries:

SELECT A, MAX(B), C

FROM source

WHERE A > 100 AND B = C

GROUP BY A, C

The software can push down all the operations in this SELECT statement to the source DBMS.

 

Full push down from the source to the target example

If the source and target are in the same datastore, the software can do a full push-down operation

where the INSERT into the target uses a SELECT from the source. In the sample data flow in scenario

1. a full push down passes the following statement to the database:

INSERT INTO target (A, B, C)

SELECT A, MAX(B), C

FROM source

WHERE A > 100 AND B = C

GROUP BY A, C

If the source and target are not in the same datastore, the software can also do a full push-down

operation if you use one of the following features:

  • Add a Data _Transfer transform before the target.
  • Define a database link between the two datastores.

 

Full push down for auto correct load to the target example

For supported databases, if you enable the Auto correct loadand Allow Mergeoptions, the Optimizer

may be able to do a full push-down operation where the SQL statement is a MERGE into the target

with a SELECT from the source.

 

In order for the Allow Mergeoption to generate a MERGE statement, the primary key of the source

table must be a subset of the primary key of the target table and the source row must be unique on the

 

target key. In other words, there cannot be duplicate rows in the source data. If this condition is not

met, the Optimizer pushes down the operation using a database-specific method to identify, update,

and insert rows into the target table.

 

For example, suppose you have a data flow where the source and target tables are in the same datastore

and the Auto correct loadand Allow Mergeoptions are set to Yes.

 

The push-down operation passes the following statement to an Oracle database:

MERGE INTO "ODS"."TARGET" s

USING

SELECT "SOURCE"."A" A , "SOURCE"."B" B , "SOURCE"."C" C

FROM "ODS"."SOURCE" "SOURCE"

) n

ON ((s.A = n.A))

WHEN MATCHED THEN

UPDATE SET s."B" = n.B, s."C" = n.C

WHEN NOT MATCHED THEN

INSERT /*+ APPEND */ (s."A", s."B", s."C" )

VALUES (n.A , n.B , n.C)

Similar statements are used for other supported databases.

 

Partial push down to the source example

If the data flow contains operations that cannot be passed to the DBMS, the software optimizes the

transformation differently than the previous two scenarios. For example, if Query1 called func(A) >

100, where func is a SAP BusinessObjects Data Services custom function, then the software generates

two commands:

  • The first query becomes the following command which the source DBMS executes:

SELECT A, B, C

FROM source

WHERE B = C

  • The second query becomes the following command which SAP BusinessObjects Data Services

executes because func cannot be pushed to the database:

SELECT A, MAX(B), C

FROM Query1

WHERE func(A) > 100

GROUP BY A, C

 

Push-down of SQL join example

If the tables to be joined in a query meet the requirements for a push-down operation, then the entire

query is pushed down to the DBMS.

 

To confirm that the query will be pushed down, look at the Optimized SQL. If the query shows a single

SELECT statement, then it will be pushed down.

 

For example, in the data flow shown below, the Department and Employee tables are joined with a

inner join and then the result of that join is joined with left outer join to the Bonus table.

2.PNG


The resulting Optimized SQL contains a single select statement and the entire query is pushed down

to the DBMS:

SELECT DEPARTMENT.DEPTID, DEPARTMENT.DEPARTMENT, EMPLOYEE.LASTNAME, BONUS.BONUS

FROM (DEPARTMENT INNER JOIN EMPLOYEE

(ON DEPARTMENT.DEPTID=EMPLOYEE.DEPTID))

LEFT OUTER JOIN BONUS

ON (EMPLOYEE.EMPID = BONUS.EMPID)

 

To view SQL

Before running a job, you can view the SQL code that SAP BusinessObjects Data Services generates

for table sources in data flows. By examining the SQL code, you can verify that the software generates

the commands you expect. If necessary, you can alter your design to improve the data flow.

1.       Validate and save data flows.

2.       Open a data flow in the workspace.

3.       Select Display Optimized SQL from the Validation menu.

Alternately, you can right-click a data flow in the object library and select Display Optimized SQL.

The "Optimized SQL" window opens and shows a list of datastores and the optimized SQL code for

the selected datastore. By default, the "Optimized SQL" window selects the first datastore.

The software only shows the SELECT generated for table sources and INSERT INTO... SELECT

for targets. It does not show the SQL generated for SQL sources that are not table sources, such

as:

  • Lookup function
  • Key_generation function
  • Key_Generation transform
  • Table_Comparison transform

4.       Select a name from the list of datastores on the left to view the SQL that this data flow applies against

the corresponding database or application.

The following example shows the optimized SQL for the second datastore which illustrates a full

push-down operation (INSERT INTO... SELECT). This data flows uses a Data_Transfer transform

to create a transfer table that the software loads directly into the target.

INSERT INTO "DBO"."ORDER_AGG" ("SHIPCOUNTRY","SHIPREGION", "SALES_AGG")

SELECT "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION" ,sum("TS_Query_Lookup"."SALES")

FROM"DBO"."TRANS2""TS_Query_Lookup"

GROUP BY "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION"

In the "Optimized SQL" window you can:

  • Use the Find button to perform a search on the SQL displayed.
  • Use the Save As button to save the text as a .sql file.

If you try to use the Display Optimized SQL command when there are no SQL sources in your data

flow, the software alerts you. Examples of non-SQL sources include:

• Message sources

• File sources

• IDoc sources

If a data flow is not valid when you click the Display Optimized SQLoption, the software alerts you.

 

Data_Transfer transform for push-down operations

Use the Data_Transfer transform to move data from a source or from another transform into the target

datastore and enable a full push-down operation (INSERT INTO... SELECT) to the target. You can use

the Data_Transfer transform to push-down resource-intensive operations that occur anywhere within

a data flow to the database. Resource-intensive operations include joins, GROUP BY, ORDER BY,

and DISTINCT.

 

Push down an operation after a blocking operation example

You can place a Data_Transfer transform after a blocking operation to enable Data Services to push

down a subsequent operation. A blocking operation is an operation that the software cannot push down

to the database, and prevents ("blocks") operations after it from being pushed down.

For example, you might have a data flow that groups sales order records by country and region, and

sums the sales amounts to find which regions are generating the most revenue. The following diagram

shows that the data flow contains a Pivot transform to obtain orders by Customer ID, a Query transform

that contains a lookup_ext function to obtain sales subtotals, and another Query transform to group the

results by country and region.

3.PNG

Because the Pivot transform and the lookup_ext function are before the query with the GROUP BY

clause, the software cannot push down the GROUP BY operation. Here is how the "Optimized SQL"

window would show the SELECT statement that the software pushes down to the source database:

SELECT "ORDERID", "CUSTOMERID", "EMPLOYEEID", "ORDERDATE", "REQUIREDDATE", "SHIPPEDDATE",, "SHIPVIA"

"FREIGHT", "SHIPNAME", "SHIPADDRESS", "SHIPCITY", "SHIPREGION", "SHIPPOSTALCODE", "SHIPCOUNTRY"

FROM "DBO"."ORDERS"

     

However, if you add a Data_Transfer transform before the second Query transform and specify a transfer

table in the same datastore as the target table, the software can push down the GROUP BY operation.

4.PNG

 

The Data_Transfer Editor window shows that the transfer type is Table and the transfer table is in the

same datastore as the target table (Northwind_DS.DBO.TRANS2).

Here's how the "Optimized SQL" window would show that the software pushed down the GROUP BY

to the transfer table TRANS2.

INSTER INTO "DBO"."ORDER_AGG" ("SHIPCOUTNRY", "SHIPREGION", "SALES_AGG")

SELECT "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION" , sum("TS_Query_Lookup"."SALES")

FROM "DBO"."TRANS2""TS_Query_Lookup"

GROUP BY "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION"

 

Using Data_Transfer tables to speed up auto correct loads example

Auto correct loading ensures that the same row is not duplicated in a target table, which is useful for

data recovery operations. However, an auto correct load prevents a full push-down operation from the

source to the target when the source and target are in different datastores.

For large loads using database targets that support the Allow Merge option for auto correct load, you

can add a Data_Transfer transform before the target to enable a full push-down from the source to the

  1. target. In order for the Allow Merge option to generate a MERGE statement:
    • the primary key of the source table must be a subset of the primary key of the target table
    • the source row must be unique on the target key

In other words, there cannot be duplicate rows in the source data. If this condition is not met, the

Optimizer pushes down the operation using a database-specific method to identify, update, and insert

rows into the target table.

If the MERGE statement can be used, SAP BusinessObjects Data Services generates an SQL MERGE

INTO target statement that implements the Ignore columns with value value (if a value is specified

in the target transform editor) and the Ignore columns with null Yes/No setting.

For example, suppose you create a data flow that loads sales orders into an Oracle target table which

is in a different datastore from the source.

For this data flow, the Auto correct load option is active and set to Yes, and the Ignore columns with

null and Allow merge options are also active.

The SELECT statement that the software pushes down to the source database would look like the

following (as it would appear in the "Optimized SQL" window).

SELECT "ODS_SALESORDER"."SALES_ORDER_NUMBER" , "ODS_SALESORDER"."ORDER_DATE" , "ODS_SALESORDER"."CUST_ID"

FROM "ODS"."ODS_SALESORDER""ODS_SALESORDER"

 

When you add a Data_Transfer transform before the target and specify a transfer table in the same

datastore as the target, the software can push down the auto correct load operation.

 

The following MERGE statement is what the software pushes down to the Oracle target (as it appears

in the "Optimized SQL" window).

MERGE INTO "TARGET"."AUTO_CORRECT_LOAD2_TARGET" s

USING

(SELECT "AUTOLOADTRANSFER"."SALES_ORDER_NUMBER" SALES_ORDER_NUMBER,

"AUTOLOADTRANSFER"."ORDER_DATE" ORDER_DATE, "AUTOLOADTRANSFER"."CUST_ID" CUST_ID

FROM "TARGET"."AUTOLOADTRANSFER" "AUTOLOADTRANSFER") n

ON ((s.SALES_ORDER_NUMBER=n.SALES_ORDRE_NUMBER00

WHEN MATCHED THEN

UPDATE SET s."ORDER_DATE"=nvl(n.ORDER_DATE,s."ORDER_DATE"), s."CUST_ID"=nbl(n.CUST_ID,s."CUST_ID"

WHEN NOT MATCHED THEN

INSERT(s."SALES_ORDER_NUMBER",s."ORDER_DATE",s."CUST_ID")

VALUES(n.SALES_ORDRE_NUMBER,n.ORDRE_DATE,n.CUSTID)


Database link support for push-down operations across datastores

Various database vendors support one-way communication paths from one database server to another.

SAP BusinessObjects Data Services refers to communication paths between databases as database

links. The datastores in a database link relationship are called linked datastores.

The software uses linked datastores to enhance its performance by pushing down operations to a target

database using a target datastore. Pushing down operations to a database not only reduces the amount

of information that needs to be transferred between the databases and SAP BusinessObjects Data

Services but also allows the software to take advantage of the various DMBS capabilities, such as

various join algorithms.

With support for database links, the software pushes processing down from different datastores, which

can also refer to the same or different database type. Linked datastores allow a one-way path for data.

For example, if you import a database link from target database B and link datastore B to datastore A,

the software pushes the load operation down to database B, not to database A.

 

Software support

SAP BusinessObjects Data Services supports push-down operations using linked datastores on all

Windows and Unix platforms. It supports DB2, Oracle, and MS SQL server databases.

 

To take advantage of linked datastores

1. Create a database link on a database server that you intend to use as a target in a job.

The following database software is required. See the Supported Platforms document for specific version numbers.

  • For DB2, use the DB2 Information Services (previously known as Relational Connect) software and make sure that the database user has privileges to create and drop a nickname.

To end users and client applications, data sources appear as a single collective database in DB2. Users and applications interface with the database managed by the information server. Therefore, configure an information server and then add the external data sources. DB2 uses nicknames to identify remote tables and views.

  • For Oracle, use the Transparent Gateway for DB2 and MS SQL Server.

See the Oracle database manuals for more information about how to create database links for Oracle and non-Oracle servers.

  • For MS SQL Server, no special software is required.

Microsoft SQL Server supports access to distributed data stored in multiple instances of SQL Server and heterogeneous data stored in various relational and non-relational data sources using an OLE database provider. SQL Server supports access to distributed or heterogeneous database sources in Transact-SQL statements by qualifying the data sources with the names of the linked server where the data sources exist.

2. Create a database datastore connection to your target database.

 

Generated SQL statements

To see how SAP BusinessObjects Data Services optimizes SQL statements, use Display Optimized

SQL from the Validation menu when a data flow is open in the workspace.

  • For DB2, it uses nicknames to refer to remote table references in the SQL display.
  • For Oracle, it uses the following syntax to refer to remote table references: <remote_table>@<dblink_name>.
  • For SQL Server, it uses the following syntax to refer to remote table references: <liked_server>.<remote_database >.<remote_user >.<remote_table>.

 

Tuning performance at the data flow or Job Server level

You might want to turn off linked-datastore push downs in cases where you do not notice performance

improvements.

For example, the underlying database might not process operations from different data sources well.

Data Services pushes down Oracle stored procedures and external functions. If these are in a job that

uses database links, it will not impact expected performance gains. However, Data Services does not

push down functions imported from other databases (such as DB2). In this case, although you may be

using database links, Data Services cannot push the processing down.

Test your assumptions about individual job designs before committing to a large development effort

using database links.

 

For a data flow

On the data flow properties dialog, this product enables the Use database links option by default to

allow push-down operations using linked datastores. If you do not want to use linked datastores in a

data flow to push down processing, deselect the check box.

This product can perform push downs using datastore links if the tables involved share the same

database type and database connection name, or datasource name, even if the tables have different

schema names. However, problems with enabling this feature could arise, for example, if the user of

one datastore does not have access privileges to the tables of another datastore, causing a data access

problem. In such a case, you can disable this feature.

 

For a Job Server

You can also disable linked datastores at the Job Server level. However, the Use database links

option, at the data flow level, takes precedence.

SAP Data Services Using Caches

$
0
0

Caching data

You can improve the performance of data transformations that occur in memory by caching as much

data as possible. By caching data, you limit the number of times the system must access the database.

SAP BusinessObjects Data Services provides the following types of caches that your data flow can use

for all of the operations it contains:

  • In-memory

Use in-memory cache when your data flow processes a small amount of data that fits in memory.

  • Page able cache

Use page able cache when your data flow processes a very large amount of data that does not fit in memory. When memory-intensive operations (such as Group By and Order By) exceed available memory, the software uses page able cache to complete the operation.

Page able cache is the default cache type. To change the cache type, use the Cache typeoption on the data flow Properties window.


Caching sources

By default, the Cacheoption is set to Yes in a source table or file editor to specify that data from the

source is cached using memory on the Job Server computer. When sources are joined using the Query

transform, the cache setting in the Query transform takes precedence over the setting in the source.

The default value for Cache typefor data flows is Page able.

It is recommended that you cache small tables in memory. Calculate the approximate size of a table

with the following formula to determine if you should use a cache type of Page able or In-memory.

Compute row count and table size on a regular basis, especially when:

  • You are aware that a table has significantly changed in size.
  • You experience decreased system performance.

If the table fits in memory, change the value of the Cache typeoption to In-memory in the Properties

window of the data flow.

 

Caching joins

The Cache setting indicates whether the software should read the required data from the source and

load it into memory or page able cache.

When sources are joined using the Query transform, the cache setting in the Query transform takes

precedence over the setting in the source. In the Query editor, the cache setting is set to Automatic by

default. The automatic setting carries forward the setting from the source table. The following table

shows the relationship between cache settings in the source, Query editor, and whether the software

will load the data in the source table into cache.

 

Cache Setting in Source

Cache Setting in Query Editor

Effective Cache Setting

Yes

Automatic

Yes

No

Automatic

No

Yes

Yes

Yes

No

Yes

Yes

 

Cache Setting in Source

Cache Setting in Query Editor

Effective Cache Setting

Yes

No

No

No

No

No

 

Changing cache type for a data flow

You can improve the performance of data transformations that occur in memory by caching as much

data as possible. By caching data, you limit the number of times the system must access the database.

To change the cache type for a data flow:

1. In the object library, select the data flow name.

2. Right-click and choose Properties.

3. On the General tab of the Properties window, select the desired cache type in the drop-down list for the Cache type option.

 

Caching lookups

You can also improve performance by caching data when looking up individual values from tables and files.

 

Using a Lookup function in a query

SAP BusinessObjects Data Services has three Lookup functions: lookup, lookup_seq, and

lookup_ext. The lookup and lookup_ext functions have cache options. Caching lookup sources

improves performance because the software avoids the expensive task of creating a database query

or full file scan on each row.

You can set cache options when you specify a lookup function. There are three caching options:

  • NO_CACHE — Does not cache any values.
  • PRE_LOAD_CACHE — Preloads the result column and compare column into memory (it loads the values before executing the lookup).
  • DEMAND_LOAD_CACHE — Loads the result column and compare column into memory as the function executes.

Use this option when looking up highly repetitive values that are a small subset of the data and when

missing values are unlikely.

Demand-load caching of lookup values is helpful when the lookup result is the same value multiple

times. Each time the software cannot find the value in the cache, it must make a new request to the

database for that value. Even if the value is invalid, the software has no way of knowing if it is missing

or just has not been cached yet. When there are many values and some values might be missing, demand-load caching is significantly less efficient than caching the entire source.

 

Using a source table and setting it as the outer join

Although you can use lookup functions inside SAP BusinessObjects Data Services queries, an alternative

is to expose the translate (lookup) table as a source table in the data flow diagram, and use an outer

join (if necessary) in the query to look up the required data. This technique has some advantages:

  • You can graphically see the table the job will search on the diagram, making the data flow easier to maintain
  • The software can push the execution of the join down to the underlying RDBMS (even if you need an outer join)

This technique also has some disadvantages:

  • You cannot specify default values in an outer join (default is always null), but you can specify a default value in lookup_ext.
  • If an outer join returns multiple rows, you cannot specify what to return, (you can specify MIN or MAX in lookup_ext).
  • The workspace can become cluttered if there are too many objects in the data flow.
  • There is no option to use DEMAND_LOAD caching, which is useful when looking up only a few repetitive values in a very large table.

 

Caching table comparisons

You can improve the performance of a Table_Comparison transform by caching the comparison table.

There are three modes of comparisons:

  • Row-by-row select
  • Cached comparison table
  • Sorted input

Row-by-row select will likely be the slowest and Sorted input the fastest.

 

Specifying a pageable cache directory

If the memory-consuming operations in your data flow exceed the available memory, SAP Business

Objects Data Services uses pageable cache to complete the operation. Memory-intensive operations

include the following operations:

  • Distinct
  • Functions such as count_distinct and lookup_ext
  • Group By
  • Hierarchy_Flattening
  • Order By

 

Using persistent cache

Persistent cache datastores provide the following benefits for data flows that process large volumes of data.

  • You can store a large amount of data in persistent cache which SAP BusinessObjects Data Services quickly pages into memory each time the job executes. For example, you can access a lookup table or comparison table locally (instead of reading from a remote database).
  • You can create cache tables that multiple data flows can share (unlike a memory table which cannot be shared between different real-time jobs). For example, if a large lookup table used in a lookup_ext function rarely changes, you can create a cache once and subsequent jobs can use this cache instead of creating it each time.

Persistent cache tables can cache data from relational database tables and files.

 

Using persistent cache tables as sources

After you create a persistent cache table as a target in one data flow, you can use the persistent cache

table as a source in any data flow. You can also use it as a lookup table or comparison table.

 

Using statistics for cache self-tuning

SAP BusinessObjects Data Services uses cache statistics collected from previous job runs to

automatically determine which cache type to use for a data flow. Cache statistics include the number

of rows processed.

The default cache type is pageable. the software can switch to in-memory cache when it determines

that your data flow processes a small amount of data that fits in memory.

 

To automatically choose the cache type

1. Run your job with options Collect statistics for optimization.

2. Run your job again with option Use collected statistics (this option is selected by default).

 

To monitor and tune in-memory and pageable caches

You can also monitor and choose the cache type to use for the data flow.

Test run your job with options Collect statistics for optimization and Collect statistics for monitoring.


The option Collect statistics for monitoring is very costly to run because it determines the cache size

for each row processed.

  1. Run your job again with option Use collected statistics (this option is selected by default).
  2. Look in the Trace Log to determine which cache type was used.
  3. Look in the Administrator Performance Monitor to view data flow statistics and see the cache size.
  4. If the value in Cache Size is approaching the physical memory limit on the job server, consider changing the Cache typeof a data flow from In-memoryto Pageable.

SAP Data Services Components

$
0
0

Components

We call Components all WorkFlows that can or should be reused, e.g. a Workflow loading an entire dimension table. So in other words, we never reuse DataFlows or other objects, we always embedded them in a Workflow called C_...... A Component consists of the actual DataFlow (maybe inside a Conditional in case initial and delta load is different) but before, all other Components required to be loaded successfully, are listed. In the attached example, C_CostCenter_SAP Component requires the Controlling Area table to be loaded first.
3.PNG

As Components might be called in several different places, e.g. Dimension Tables belonging to different Star Schemas and therefore are part of Sections, each such Component WorkFlow should have its property Execute Only Once set.

Component Checklist

 

Common DI Design considerations

10

What other Components are required to be loaded upfront? Add those as the first objects in this Workflow and connect them to ensure they are loaded prior execution of our DataFlow.

20

Is the flag "execute only once" set for this Workflow (Component)? Otherwise, it would be loaded again if it is a dependant object to others.

30

Does the Component have a Description to document what it is doing?

Data Considerations

 

10

In Initial Load mode, is the table being truncated?

20

Does the target table have a primary key and do you make sure it cannot be violated regardless of the source data?

30

What will happen if this Component failed with the load in the middle and is restarted again? It has to be ensured that it will deal with that situation automatically, e.g. truncate the table prior the load, load with autocorrect load/table comparison or having a sql() script that deletes the data going to be loaded. Also consider that problem for all other supporting tasks as well like dropping and creating an index.

40

For all columns used as foreign key or used in filter conditions in end user queries, make sure the column value is never NULL.

50

What columns are marked as NOT-NULL in the target table? These should have the guarantee that there will never ever be a NULL value.

60

If joins between source tables are required, join them in the first query if possible to ensure the join is executed in the source database if possible.

70

Use lookup_ext function calls rather than joins if - there is no unique constraint (primary key) on the lookup condition - the cache setting has to be controlled directly - the lookup table is small (less than 100'000 rows) - the columns returned by the lookups are not key to understand the dataflow (lookup is hidden inside a query, a join is obvious when looking at the dataflow)

80

Inside nvl() functions and as default values of the lookups, use the global variables $G_DEFAULT_DATE, $G_DEFAULT_NUMBER, $G_DEFAULT_TEXT instead of hard coding a value unless it is really required.

SAP Data Services Sections

$
0
0

Sections

Similar to a Data Warehouse and its multiple Star Schemas, the Job loading the entire system should have one Workflow for each Star Schema, we call it a Section. As a Section is a reusable Object per definition - Components so to say - we name those objects C_...._Section. And a Section for sure requires some other objects loaded upfront - all the dimension tables for this star schema at least - so a WorkFlow containing the entire Dimension table Components is the first object. After that, in almost all cases, you will have a Conditional to distinct between Initial and Delta load of the Fact Tables.

3.PNG
Dimension Tables

To make sure all foreign keys are valid, we will add one additional row to the dimension table, to which all fact rows refer to in case they do not have a proper foreign key value. For better readability, we use separate Embedded DataFlows for each such source and call those e.g. EDF_Oracle_.... or EDF_DefaultRow_.... depending on what the source actually is. Both sources are merged together using the Merge Transform to be finally loaded into the target table. In case it is absolutely impossible - for sure - guaranteed - the EDF_DefaultRow can be omitted. But still, for consistency, keep the source inside Embedded DataFlows.
3.PNG

The EDF_DefaultRow is built to add new artificial row(s). In many cases the Primary Key of this table will be just one column, so we need to add just one row - using Row_Generation Transform as source. Depending on the datatype, the value of this key is assigned from the G_DEFAULT_xxx Global Variable.

3.PNG

However, if the Primary Key is a combined one, e.g. COMPANYID, COSTCENTERID, the DefaultRow DataFlow for the CostCenter will have the Dimension Table with all COMPANYIDs as the source, so that for each COMPANYID a unknown COSTCENTER does exist. This also means this dimension table has to be loaded first, and as said at Components, is therefore a requirement and the first object in the C_CostCenter Component.

Fact tables

When loading Fact Tables some additional "tender loving care" has to be taken. First, on fact tables there are many indexes and aggregate tables that might slow down the loading performance or need maintenance. For this reason, before and after each fact table a stored procedure is called. These stored procedures should deal with all database maintenance tasks prior and after loading the fact table and get parameters like $G_LOAD_TYPE (is it an initial or delta load?), the name of the fact table to be dealt with and more.
3.PNG

PreLoad Script:

print('Calling pre-processing of CO_TRANS_FACT transaction data...');

DWH_DS.CA_DWH.PREPROCESSING_FACT_TABLE($G_LOAD_TYPE, 'Y', 'CO_TRANS_FACT');

print('Calling pre-processing of CO_TRANS_FACT transaction data...done');

PostLoad Script:

print('Calling post-processing of PROFIT_CENTER_FACT...');

DWH_DS.CA_DWH.POSTPROCESSING_FACT_TABLE($G_LOAD_TYPE, $G_REBUILD_INDEXES, 'PROFIT_CENTER_FACT', $P_FAILURES);

if ($P_FAILURES = 0)

  print('Calling post-processing of PROFIT_CENTER_FACT...done');

else

  print('Calling post-processing of PROFIT_CENTER_FACT...done but with {$P_FAILURES} errors');

SAP Data Services PreLoad Stored Procedure

$
0
0

PreLoad Stored Procedure

This database stored procedure deals with all the DB maintenance before loading the fact table. So for an initial load, it disables all objects that might slow down the load, for delta load it optionally disables indexes based on the $G_REBUILD_INDEXES global variable.

PreLoad Stored Procedure for Oracle

For Oracle, fact tables do have Bitmap Indexes and Materialized Views together with Materialized View Logs. For an initial load, all the Bitmap Indexes should be disabled - but since there is no way to do that without drawback we drop them and remember the index settings in a table called LOOKUP_INDEXES - and the Materialized View Logs dropped.

Preprocessing_fact_table source code:

create or replace procedure preprocessing_fact_table(

          pLoadType in varchar2,

          pDropCreateIndex in varchar2,

          pFactTable in varchar2) is

  cursor cMViews(pTableName varchar2) is

    select mview_name

from USER_MVIEW_DETAIL_RELATIONS

      where DETAILOBJ_NAME = pTableName and DETAILOBJ_TYPE = 'TABLE';

begin

  if pLoadType = 'FIRST' then

    drop_mview_logs(pFactTable);

  end if;

 

  if pDropCreateIndex = 'Y' or pLoadType = 'FIRST' then

drop_indexes(pFactTable);

    for rMViews in cMViews(pFactTable) loop

      drop_indexes(rMViews.mview_name);

    end loop;

  end if;

end preprocessing_fact_table;

/

 

 

create or replace procedure Drop_Mview_Logs(

          pFactTable IN VARCHAR2) IS

  CURSOR cMViews IS

    SELECT mview_name

    FROM user_mviews

    WHERE compile_state = 'NEEDS_COMPILE';

  CURSOR cMViewTables(pTableName VARCHAR2) IS

    SELECT DISTINCT a.DETAILOBJ_NAME table_name

FROM USER_MVIEW_DETAIL_RELATIONS a, USER_MVIEW_DETAIL_RELATIONS b, user_snapshot_logs c

WHERE a.mview_name = b.mview_name AND

            a.DETAILOBJ_NAME = c.master AND

            b.DETAILOBJ_NAME = pTableName AND

            b.DETAILOBJ_TYPE = 'TABLE';

  vCursor INTEGER;

  vReturn INTEGER;

BEGIN

  FOR rMViews IN cMViews LOOP

    EXECUTE IMMEDIATE 'alter materialized view ' || rMViews.mview_name || ' compile';

  END LOOP;

  FOR rMViewTables IN cMViewTables(pFactTable) LOOP

EXECUTE IMMEDIATE 'DROP materialized VIEW LOG ON ' || rMViewTables.table_name;

  END LOOP;

END Drop_Mview_Logs;

/

 

 

create or replace procedure drop_indexes(

          pFactTable in varchar2) is

  cursor cDropIndex(pTableName in varchar2) is

    select distinct i.index_name index_name,

       'tablespace ' || nvl(i.tablespace_name,

       p.tablespace_name) ||

       decode(i.partitioned, 'YES', ' LOCAL ', ' ') ||

       decode(i.degree, 'DEFAULT', 'PARALLEL ', '') additional_text

    from user_indexes i, user_ind_partitions p

    where i.index_type = 'BITMAP' and

          i.table_name = pTableName and

          i.index_name = p.index_name(+);

  cursor cIndexColumns(pIndexName in varchar2) is

    select column_name, column_position

    from user_ind_columns

    where index_name = pIndexName

    order by column_position;

  vCursor integer;

  vReturn integer;

  vColumn_Name1 varchar2(32);

  vColumn_Name2 varchar2(32);

  vColumn_Name3 varchar2(32);

begin

  for rDropIndex in cDropIndex(pFactTable) loop

    vColumn_Name1 := null;

    vColumn_Name2 := null;

    vColumn_Name3 := null;

    for rIndexColumns in cIndexColumns(rDropIndex.index_name) loop

      if rIndexColumns.column_position = 1 then

        vColumn_Name1 := rIndexColumns.column_name;

      elsif rIndexColumns.column_position = 2 then

        vColumn_Name2 := rIndexColumns.column_name;

      elsif rIndexColumns.column_position = 3 then

        vColumn_Name3 := rIndexColumns.column_name;

      end if;

    end loop;

    delete lookup_indexes where index_name = rDropIndex.index_name;

    insert into lookup_indexes(index_name, table_name,

      column_name1, column_name2, column_name3, additional_text)

    values (rDropIndex.index_name, pFactTable,

      vColumn_Name1, vColumn_Name2, vColumn_Name3, rDropIndex.additional_text);

    execute immediate 'drop index ' || rDropIndex.index_name;

    commit;

  end loop;

end drop_indexes;

/

 

 

CREATE TABLE LOOKUP_INDEXES (

  INDEX_NAME       VARCHAR2 (64),

  TABLE_NAME       VARCHAR2 (64),

  COLUMN_NAME1     VARCHAR2 (64),

  COLUMN_NAME2     VARCHAR2 (64),

  COLUMN_NAME3     VARCHAR2 (64),

  ADDITIONAL_TEXT  VARCHAR2 (64) );

SAP Data Services PostLoad Stored Procedure

$
0
0

PostLoad Stored Procedure

This stored procedure should recreate all objects required for the data warehouse to perform efficiently, create index that have been dropped before, refresh aggregate tables, rebuild the indexes on the aggregate tables if they have been disabled before, etc. The procedure will return an integer value with the number of problems found, but will never raise an error so that the load can continue.

PostLoad Stored Procedure for Oracle

Here, the opposite will be done to before. Create all Bitmap Indexes that haven't been created yet; on all Fact Tables that have [Materialized Views] based on them, create the Materialized View Logs and finally refresh the [Materialized Views] to have the most current information there. It is important to write this procedure as error save as possible, e.g. this procedure might execute only partially and will be re-executed again. Therefore, if a task has been performed already, it should be skipped.

Postprocessing_fact_table source code:

create or replace procedure postprocessing_fact_table(

          pLoadType in varchar2,

          pDropCreateIndex in varchar2,

          pFactTable in varchar2,

          pMView_Failures out binary_integer) is

  cursor cMViews(pTableName varchar2) is

    select mview_name

from USER_MVIEW_DETAIL_RELATIONS

      where DETAILOBJ_NAME = pTableName and

            DETAILOBJ_TYPE = 'TABLE';

  owner varchar2(30);

begin

  create_indexes(pFactTable);

  create_mview_logs(pFactTable);

  dbms_mview.refresh_dependent(pMView_Failures, pFactTable, '?', NULL, true, true);

  for rMViews in cMViews(pFactTable) loop

    create_indexes(rMViews.mview_name);

  end loop;

end postprocessing_fact_table;

/

 

 

create or replace procedure create_indexes(

          pFactTable in varchar2) is

  cursor cCreateIndex(pTableName in varchar2) is

    select index_name, table_name,

           column_name1, column_name2, column_name3,

           additional_text

    from lookup_indexes

    where table_name = pTableName and

          index_name not in (select index_name from user_indexes);

  vCursor integer;

  vReturn integer;

  vColumnList varchar2(4000);

begin

  for rCreateIndex in cCreateIndex(pFactTable) loop

    if (rCreateIndex.column_name1 is not null) then

      vColumnList := rCreateIndex.column_name1;

      if (rCreateIndex.column_name2 is not null) then

        vColumnList := vColumnList || ', ' ||

                       rCreateIndex.column_name2;

        if (rCreateIndex.column_name3 is not null) then

          vColumnList := vColumnList || ', ' ||

                         rCreateIndex.column_name3;

        end if;

      end if;

      execute immediate 'create bitmap index ' ||

         rCreateIndex.index_name || ' on ' || pFactTable ||

         '(' || vColumnList || ') ' ||

         rCreateIndex.additional_text;

    end if;

  end loop;

end create_indexes;

/

 

 

create or replace procedure Create_Mview_Logs(

          pFactTable IN VARCHAR2) IS

  CURSOR cMViews IS

    SELECT mview_name

    FROM user_mviews

    WHERE compile_state = 'NEEDS_COMPILE';

  CURSOR cMViewTables(pTableName VARCHAR2) IS

    SELECT DISTINCT a.DETAILOBJ_NAME table_name

FROM USER_MVIEW_DETAIL_RELATIONS a, USER_MVIEW_DETAIL_RELATIONS b, user_mviews c

WHERE a.mview_name = b.mview_name AND

            a.mview_name = c.mview_name AND

            b.DETAILOBJ_NAME = pTableName AND

            b.DETAILOBJ_TYPE = 'TABLE' AND

            c.FAST_REFRESHABLE IN ('DML', 'DIRLOAD_DML') AND

            refresh_method = 'FORCE'

MINUS

SELECT master FROM user_snapshot_logs;

  CURSOR cColumns(pTableName VARCHAR2) IS

    SELECT column_name

    FROM user_tab_columns

    WHERE table_name = pTableName;

  vCursor INTEGER;

  vReturn INTEGER;

  vColumnList VARCHAR2(4000);

BEGIN

  FOR rMViews IN cMViews LOOP

    EXECUTE IMMEDIATE 'alter materialized view ' ||

           rMViews.mview_name || ' compile';

  END LOOP;

  FOR rMViewTables IN cMViewTables(pFactTable) LOOP

    vColumnList := NULL;

FOR rColumns IN cColumns(rMViewTables.table_name) LOOP

   IF vColumnList IS NULL THEN

     vColumnList := '"' || rColumns.column_name || '"';

   ELSE

     vColumnList := vColumnList || ', "' || rColumns.column_name || '"';

   END IF;

END LOOP;

EXECUTE IMMEDIATE 'CREATE materialized VIEW LOG ON ' ||

        rMViewTables.table_name || ' WITH SEQUENCE, ROWID (' ||

        vColumnList || ') INCLUDING NEW VALUES';

  END LOOP;

END Create_Mview_Logs;

/


SAP Data Services Initial vs. Delta Load

$
0
0

Initial vs. Delta Load

As said in the SAP Data Services Sections blog, the fact table component/section consists of a first workflow named WF_xxxxDims with all the Components for the dimension tables of this star schema and a Conditional for either performing the initial or delta load of the fact table. But what is the actual difference between those two? Very often the initial load can be build similar to the delta load - only difference is the range of days that should be loaded. In other words, $G_SDATE will have a value of last year or so for the initial load whereas for a delta load it will be e.g. yesterday. So the embedded dataflow extracting the data can be the same for both. At the DataFlow level, you will find a difference: Initial load will truncate the table; Delta load will use either AutoCorrect load option or have Table Comparison Transform before the table loader.

As a result, it is beneficial to use the Embedded DataFlows at least here, so changes have to be applied just once reducing the maintenance overhead and reducing the chance of inconsistencies between the two.

 

Restartability for Initial Loads

In case the Initial Load failed or has to be commenced a second time, the End User simply has to restart it. Just imagine a new fact attribute should get added therefore all have to be reloaded - better think when building the flows rather than having tons of tasks that have to be followed or your load will fail or worse have some data twice then.

Tables already loaded get truncated automatically via either a script or using the "delete data before load" option.

And the PreLoad Stored Procedure takes care about all the Indexes or other database objects that might have been created already and should be disabled first.


Restartability for Delta Loads

Same with Delta Loads. In case a load failed during the night, you do not want to waste time in the morning with tasks that have to be completed before a new load can be performed. You will be in hurry, just execute the job and all should be taken care of automatically.

For simplicity, very often you will use "delete data before load" for the small to midsized dimension tables and use Table Comparison when performing a delta load into one table. This way, you can guarantee that no data is lost - maybe read twice but not lost - and does not occur twice as the Table Comparison (sorted input) Transform will update already existing rows.


Supports the Recovery Feature

If a Job was executed in recovery mode, in the repository a list of all already executed elements is kept. So in case the load failed, you have the option to restart the flow that caused the error and continue from there, rather than starting all again, just to find out that still the last flow fails because of e.g. not enough disk space.

But again, this feature restarts the entire e.g. DataFlow, it does not capture the status inside a flow - this would be impossible (how do you guarantee that the same data is read from the database in the same order??) or would at least require lots of overhead to write the current buffer to disk. So all your objects should be aware that they might be started a second time.

Actually this is not much different to the rules above, Restartability for Delta Loads, as we use e.g. Table Comparison Transform anyway. But there is one important issue: When you used a script to truncate/delete rows prior to loading. In case the job is restarted at the point of failure, the script was already executed successfully, hence it would be skipped. To deal with that issue, those WorkFlows have to have the property Recover as a Unit turned on, so they are treated as successfully executed only once everything inside was executed without a problem, otherwise they start the first object inside this WorkFlow again.

How to use Validate Transform

$
0
0

Introduction:-

 

 

Validation transform is used to filter or replace the source dataset based on criteria or validation rules to produce desired output dataset.

It enables to create validation rules on the input dataset, and generate the output based on whether they have passed or failed the validation
condition.


In this Scenario we are validating the data from the database table with correct format of the zip code.

If the zip code is less than 5 digit then we will filter that data & pass it to another table.

 

The Validation transform can generate three output dataset Pass, Fail, and RuleViolation.


  1. The Pass Output schema is identical with the Input schema.
  2. The Fail Output schema has 3 more columns, DI_ERRORACTION and DI_ERRORCOLUMNS, DI_ROWID.
  3. The RuleViolation has three columns DI_ROWID, DI_RULENAME and DI_COLUMNNAME.


Steps:-


1) Create project, job, workflow, dataflow as usual.


2) Drag source table, Validate transform& provide details.


1.png


  • Double click on Validation transform to provide details. You can see the 3 types of dataset as described above.

 

2.png

 

  • Add a validation rule.

 

3.png

 

  • Click Add & fill the details about the rule as follows.

 

4.png

 

Action on Fail:-

                1) Send to Fail:-  on failure of the rule the record will sent to another target with "Fail" records.

 

                2) Send to Pass:- even on failure pass the record to the normal target

 

                3) Send to Both:- sends to both the targets.

 

Column Validation:-

                Select the column to be validated, then decide the condition.

 

                We have selected "Match Pattern"as the condition  pattern as '99999'.

 

                So it will check whether Zip code is of 5 digits or not.

 

  • Press OK. Then you can see the entry get added as follows.

 

5.png

 

3) Add a Target table to the dataflow & link the Validate Transform to it.

 

1.png

 

  • Choose the validate condition as "Pass"

 

2.png

 

  • Similarly do the connection for "Fail" & "Rule Violation" condition.

 

3.png

 

4) Validate the job & execute it.

 

5) Check the input & output.

 

  • Input:-

 

4.png

 

  • You can see in the input in the above figure where the last row has zip code of less than 5 digits. Now view the output.

 

  • Output for Pass condition:-

 

5.png

 

  • Output for Fail condition

 

6.png

 

     You can see that the invalid record from input is transferred to the  "CUST_Fail" table as shown above.

     Three more columns "DI_ERRORACTION", "DI_ERRORCOLUMNS", "DI_ROWID" can also be seen.

 

  • Output of the "RuleViolation" condition.

7.png

 

Summary:-

 

So in this way Validate transform is useful in validating the records based on the rules & categorising the bad records into different target which can be analysed later.

 

 

Thanks & Regards,

 

Rahul More

 

(Project Lead)

1.jpg

Replicate tables from PostgreSQL Database

$
0
0

Hi,

 

     I am using BODS tool for my replicating tables into HANA.I want to replicate tables from PostgreSQL.

     Postgre database type is not availble in dropdown of datastore, 

    

     So please anyone help me to create a Datastore for PostgreSQL.

 

Thanks,

Ranjith.

SAP Data Services and JDBC

$
0
0

SAP Data Services and JDBC


SAP Data Services, as of 4.2 SP2, now supports the use of JDBC drivers as a data source. This capability has been implemented through a new adapter type, JDBC.

 

To add a JDBC adapter you use the Data Services Administration Console, browse to adapter instances, choose your job server and then select Adapter Configuration.

Adapter List.png

 

Select JDBCAdapter. I’m using SAP ASE as a source so I have added the JDBC library files for ASE (<install Directory>jConnect-7_0\classes\jconn4.jar) to the classpath.

Config.png

 

Then you add the class name, driver URL, username and password. Class name and URL should be contained in the documentation from the JDBC driver provider. The URL for ASE includes servername, port and database name.

Connection.png

The next section is used to configure what Data Services pushes down to the database. This will vary by database type and JDBC driver provider.

Pushdown.pngPushdown2.png


Once you have applied the configuration settings you can start the adapter. This is now ready for use in Data Services Designer.

start.png

 

In Designer create a new datastore and change the type to Adapter. Select the adapter you have created in the management console and select OK.

datastore.png

 

You can now browse the metadata as you would in any other datastore type and import the tables you require.

metadata.pngimport.png

In this example I’ve imported 2 tables and joined them together in a data flow. Within the query transform I’ve also applied functions to 3 of the fields to test what gets pushed down. You can see from the Optimised SQL that the Upper, Year and Month functions have been pushed down.

dataflow.png

 

There are a few limitations with the JDBC Adapter, which include no View Data, Stored Procedures, Lookup and Table Comparisons. Hopefully, as the use of this adapter grows, these will be addressed in future releases.

BODS->SAP - Sales order creation - IDOC method - “ FIELD RV45A-VBAP_SELKZ(2) is not an input field”

$
0
0

Hello Everyone ,

 

 

Just thought of sharing  an  issue I have came across while loading Legacy sales orders in to SAP ,We were using IDOC ( ORDERS05) to load data into SAP from BODS and we could see some of the IDOC’s got failed with an error message “ FIELD RV45A-VBAP_SELKZ(2) is not an input field”.

 

sample.png

 

 

We thought our issue was over when we found the solution on SDN with the above screen shot, but we have got this note 1264003 already in place.

 

Fruther to that ,We have validated all the data and everything seems to be fine. We have taken a sample IDOC from the failed list and try to process the same through BD87->Restrictand process-> Foreground Processing.The sample IDOC which we have taken got 2 line items and we could see only one line item getting processed successfully and we are able to save the sales order successfully.

 

 

This seems weird and when we validated it again, Nothing seems to be an issue. We are populating E1EDP01-POSEX field with the legacy sales order item number . We were not populating the line items in the sorted order ( If you have legacy line item in place ) and we thought of giving it a try sorting the item data based on the legacy item number( Obvioulsy header also sorted in such a way it maches the respective item data ) and re-triggered the IDOC though WE19 and yes , It worked like a charm and we are able to process the IDOC’s successfully.

 

Thanks,

AJ.

 

BODS – SAP table extraction – Running for a long time or ABAP Run time error or Core dump

$
0
0

Hello Everyone,

 

 

This scenario and Fixes are applicable only if you are using just a normal query transform to pull the SAP table records in BODS staging tables.

 

 

In a normal scenario,we pull the SAP table from the data store and make it as a source and a query transform to transfer the required fields to the target table. If you double click the SAP table which you have pulled inside the designer area you can see a check box “Execute in background (batch) “unchecked.  – Thanks to Dirk venken on pointing this out.

 

 

When the table extraction job is executed in BODS, what happens in SAP is, it will be executed in one of the application servers (If you have many) Work processes. Execute transaction SM51 in SAP and double click on the app servers and check on the active work processes using the BODS system user ID as a filter to see what operation is getting carried out.

 

 

Program /BODS/* will be executed for extracting the data from SAP , but the main point to note here is SAP will execute this processes ,i.e. this table extraction process through a dialogue work process.  Dialogue work process in fine when the data volume is minimum and for relatively smaller tables. But it’s always advisable to go with the back ground process, Rather than the dialogue one if the operation is going to be resource intensive and high data volume operations. 

 

 

Due to this, large tables like BSEG, EDIDC, EDIDS, etc. will run for the longer time or will be terminated with an ABAP dump or core dump in BODS. (Unless you have logically separated the data extraction based on multiple filters for same table in different query transforms and finally merging them in to a single table, this could be a pain where your BODS job might go for changes frequently when the filters needs to be modified.)

 

 

In order to make this extraction run in the back ground processes we need to mark the check box “Execute in back ground (Batch)”. Once you check this and re-execute the job, you may see this error in the BODS job log “RFC_ABAP_RUNTIME_FAILURE - Exception_Key: READ_REPORT_LINE_TOO_LONG”

 

 

 

The reason for the dump is SAP System is trying to read the program /BODS/JOB_RUN in to a structure PROGTAB-Line has a length of 72 , but Program /BODS/JOB_RUN has lines width > 72. 

 

 

The following note is available in the market place and if the pilot note suggested by SAP is not applicable to your version, open an incident to request Product Support release the note for your organization.

 

 


1.png

 

Thanks,

AJ.

 
 
 
 
 
 
 
 
 
 
 
 




 

 

 

 

 

Thanks,

 

 

  1. AJ.

Data_Transfer

$
0
0

I am have doubt in DATA_TRANSFER transform Here transfer type is ( AUTOMATIC , FILE & DATABASE) . i selected FILE and providing required directory and required file.  Unfortunately in file i am providing my source file. After execution of that JOB my source file is disappeared. Could you please help me to get my file back


Error Invalid string or Buffer length while transfering data from Microsoft SQL Server 2008 to SAP HANA

$
0
0

Hello all ,

 

Below is the datastore and data flow of my scenario where data is to be transferred from Microsoft SQL Server 2008 to SAP HANA using SQL transform .

 

err_41.png

err_1.PNG

dav_1.png

 

While executing the above JOB i'm getting the following error:-

 

err_3.PNG

 

Please help me with this issue

 

Thanks in advance,

DAVID KING J

How to use Validate Transform

$
0
0

Introduction:-

 

 

Validation transform is used to filter or replace the source dataset based on criteria or validation rules to produce desired output dataset.

It enables to create validation rules on the input dataset, and generate the output based on whether they have passed or failed the validation
condition.


In this Scenario we are validating the data from the database table with correct format of the zip code.

If the zip code is less than 5 digit then we will filter that data & pass it to another table.

 

The Validation transform can generate three output dataset Pass, Fail, and RuleViolation.


  1. The Pass Output schema is identical with the Input schema.
  2. The Fail Output schema has 3 more columns, DI_ERRORACTION and DI_ERRORCOLUMNS, DI_ROWID.
  3. The RuleViolation has three columns DI_ROWID, DI_RULENAME and DI_COLUMNNAME.


Steps:-


1) Create project, job, workflow, dataflow as usual.


2) Drag source table, Validate transform& provide details.


1.png


  • Double click on Validation transform to provide details. You can see the 3 types of dataset as described above.

 

2.png

 

  • Add a validation rule.

 

3.png

 

  • Click Add & fill the details about the rule as follows.

 

4.png

 

Action on Fail:-

                1) Send to Fail:-  on failure of the rule the record will sent to another target with "Fail" records.

 

                2) Send to Pass:- even on failure pass the record to the normal target

 

                3) Send to Both:- sends to both the targets.

 

Column Validation:-

                Select the column to be validated, then decide the condition.

 

                We have selected "Match Pattern"as the condition  pattern as '99999'.

 

                So it will check whether Zip code is of 5 digits or not.

 

  • Press OK. Then you can see the entry get added as follows.

 

5.png

 

3) Add a Target table to the dataflow & link the Validate Transform to it.

 

1.png

 

  • Choose the validate condition as "Pass"

 

2.png

 

  • Similarly do the connection for "Fail" & "Rule Violation" condition.

 

3.png

 

4) Validate the job & execute it.

 

5) Check the input & output.

 

  • Input:-

 

4.png

 

  • You can see in the input in the above figure where the last row has zip code of less than 5 digits. Now view the output.

 

  • Output for Pass condition:-

 

5.png

 

  • Output for Fail condition

 

6.png

 

     You can see that the invalid record from input is transferred to the  "CUST_Fail" table as shown above.

     Three more columns "DI_ERRORACTION", "DI_ERRORCOLUMNS", "DI_ROWID" can also be seen.

 

  • Output of the "RuleViolation" condition.

7.png

 

Summary:-

 

So in this way Validate transform is useful in validating the records based on the rules & categorising the bad records into different target which can be analysed later.

 

 

Thanks & Regards,

 

Rahul More

 

(Project Lead)

1.jpg

Error handling and job scheduling without using Control tables on database

$
0
0

In BODS, we sometimes face requirements like executing Job1 upon successful completion of which, Job2 should be triggered and so on for Job3.

(In data warehousing, we may have jobs that load dimension tables and some jobs those are loading fact tables. So we need to schedule jobs such that all the dimension jobs should get completed successfully, on the basis of which fact loading jobs can be started)

In order to fulfil such requirements, we have 2 approaches.


6.1 Using control tables on database side:


We can create a control tables which will maintain the status of jobs running and same are updated with the BODS jobs.

 

6.2  Using BODS internal tables:

 

BODS internally maintaining Metadata tables in which we can find job status whether it is running,failed or successfully completed.

AL_HISTORY table is the internal table maintain by BODS, having information related to jobs that are running.

Table structure is as follows:


  

Column Name

Description

Object Key

Internal ID of the job within the repository

INST_MACHINE

Computer on which the job was executed.

TYPE

Batch job or a real-time job.

SERVICE

Name of Job

START_TIME

Time when job execution started

END_TIME

Time when job execution completed.

EXECUTION_TIME

Difference between start and end time.

STATUS

Job status: While job is running S , Completed D and if there is a error in job then Status would be E.

HAS_ERROR

0- If there is no error & 4 - If there is an error.


This metadata table of BODS consists of all the job status whether they are running or completed or terminated because of some error.


I have created a test job and I will show how to get job status from AL_HISTORY table.


The job name is: J_ERROR_HANDLING.

When I run the job and trigger a query such as

SELECT * FROM al_history K
WHERE upper(k.service) = 'J_ERROR_HANDLING');

, my result set is as follows:


E1.jpg

 

If I wish to get status for latest instance then following query needs to be triggered:


SELECT * FROM al_history i
WHERE i.object_key IN (
SELECTMAX(k.object_key) FROM al_history k
WHERE upper(k.service) = 'J_ERROR_HANDLING');


Result:

E2.jpg

 

So by using AL_HISTORY table, we can easily get latest instant of any job running and using that status we can schedule other dependent jobs.

How to setup hadoop configuration?

$
0
0

This post is about setting up hadoop configuration and editing the deployment configuration files for MapReduce and HDFS.

Here are the steps to setup configuration files-

 

     1. You must edit and source files present in companion files, including script files and configuration files.

          Or else, you can copy the content to ~/.bash_profile) and set up environmental variables inside your environment.

 

     2. You can extract the files from the downloaded scripts, from the configuration_files/core_hadoopdirectroy to a momentary directory.

 

     3. Make modifications to configuration files.

 

In the momentary directories, you can locate the following files and make modifications in properties as per your environment. Look for TODO in the file to replace the properties.

 

    A. Edit the file named- core-site.xml and modify the listed properties:



     B. Now do editing of hdfs-site.xml file and make modifications in the following properties:


 


 

     C. Now edit the mapred-site.xml file and make modifications to following properties:


 

     D. Make edits to taskcontroller.cfg file and modify-

 

 

     4. You may now copy the configuration files

 

          a. You need to replace your installed hadoop configurations with enhanced core_hadoop config files.


rm ~ rf $HADOOP_CONF_DIR mkdir -p $HADOOP_CON_DIR

          b. you need to copy your modified config files to $hadoop_conf_cir on all nodes.


          C. to set apt permissions, use the following code-


chmod a+x $HADOOP_CONF_DIR/ chown -R $HDFO_USER.$HADOOP_GROUP $HADOOP_CONF_DIR/../ chmod - R 755 $HADOOP_CONF_DIR/../

 

In this way you can setup big data hadoop configuration. To know more about hadoop development, contact Aegis experts as they are offering Hadoop related services at best deals.

Error Invalid string or Buffer length while transfering data from Microsoft SQL Server 2008 to SAP HANA

$
0
0

Hello all ,

 

Below is the datastore and data flow of my scenario where data is to be transferred from Microsoft SQL Server 2008 to SAP HANA using SQL transform .

 

err_41.png

err_1.PNG

dav_1.png

 

While executing the above JOB i'm getting the following error:-

 

err_3.PNG

 

Please help me with this issue

 

Thanks in advance,

DAVID KING J

Viewing all 236 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>