Quantcast
Channel: Data Services and Data Quality
Viewing all articles
Browse latest Browse all 236

SCD Type 1 Full Load With Error Handle - For Beginners

$
0
0

This example may help us to understand the usage of SCD Type 1 and with how to handle the error messages.


Brief about Slowly Changing Dimensions: Slowly Changing Dimensions are dimensions that have data that changes over time.

There are three methods of handling Slowly Changing Dimensions are available: Here we are concentrating only on SCD Type 1.


Type 1-  No history preservation - Natural consequence of normalization.

 

For a SCD Type 1 change, you find and update the appropriate attributes on a specific dimensional record. For example, to update a record in the

SALES_PERSON_DIMENSION table to show a change to an individual’s SALES_PERSON_NAME field, you simply update one record in the

SALES_PERSON_DIMENSION table. This action would update or correct that record for all fact records across time. In a dimensional model, facts have no meaning until you link them with their dimensions. If you change a dimensional attribute without appropriately accounting for the time dimension, the change becomes global across all fact records.

 

This is the data before the change:

 

SALES_PERSON_

KEY

SALES_PERSON_

ID

NAMESALES_TEAM
1500120Doe, John BAtlanta

 

This is the same table after the salesperson’s name has been changed:


SALES_PERSON_

KEY

SALES_PERSON_

ID

NAMESALES_TEAM
1500120Smith, John BAtlanta


However, suppose a salesperson transfers to a new sales team. Updating the salesperson’s dimensional record would update all previous facts so that the

salesperson would appear to have always belonged to the new sales team. This may cause issues in terms of reporting sales numbers for both teams. If you want to preserve an accurate history of who was on which sales team, Type 1 is not appropriate.


Below is the step by Step Batch Job creation using SCD Type 1 using error Handling.


Create new job

 

 

Add Try and "Script" controls from the pallet and drag to the work area

 

Create a Global variable for SYSDATE

 

6.png

 

Add below script in the script section.

 

# SET TODAYS DATE

$SYSDATE = cast( sysdate( ), 'date');

print( 'Today\'s date:' || cast( $SYSDATE, 'varchar(10)' ) ); 

 

 

Add DataFlow.

 

Now double click on DF and add Source Table.

 

Add Query Transformation

 

Add LOAD_DATE new column in Query_Extract

Map created global variable $SYSDATE. If we mention sysdate() this functional call every time which may hit the performance.

 

14.png

 

Add another query transform for lookup table

 

  Create new Function Call for Lookup table.

 

 

20.png

 

21.png

 

Required column added successfully via Lookup Table.

 

Add another Query Transform. This query will decide whether source record will insert and update.

 

Now remove primary key to the target fileds.

 

 

 

Create new column to set FLAG to update or Insert.

 

Now write if then else function if the LKP_PROD_ID is null update FLAG with INS if not with UPD.

 

ifthenelse(Query_LOOKUP_PRODUCT_TIM.LKP_PROD_KEY is null, 'INS', 'UP')

 

27.png

 

Now Create case Transform.

 

30.png

 

 

Create two rules to FLAG filed to set “INS” or ”UPD”

Create Insert and Update Query to align the fields

Change LKP_PROD_KEY to PROD_KEY and PROD_ID to SOURCE_PROD_ID for better understanding in the target table.

Now create  Key Generation transform to generate Surrogate key

Select Target Dimension table with Surrogate key (PROD_KEY)

Set Target instance

 

 

43.png

Add a Key_Generation transformation for the Quary_Insert to add count for the new column.

 

And for Query _Update we need Surrogate key and other attributes. Use the Map Operation transform to update records.

 

By default Normal mode as Normal. We want to update records in normal mode.

 

53.png

 

Update Surrogate key, Product key and other attributes.

 

Go back to insert target table -->  Options --> Update Error Handling as below:

55.png

 

Go back to Job screen and create catch block

 

57.png

 

Select required exception you want to catch. and Create script to display error messages

 

60.png

 

Compose your message to print errors in the script_ErrorLogs as below.

 

print( 'Error Handling');

print( error_message() || ' at ' || cast( error_timestamp(), 'varchar(24)'));

raise_exception( 'Job Failed');

 

now Validate script before proceed further.

 

Now these messages will catch errors with job completion status.

Now create a script to print error message if there is any database rejections:

 

65.png

 

# print ( ' DB Error Handling');

if( get_file_attribute( '[$$LOG_DIR]/ VENKYBODS_TRG_dbo_Product_dim.txt ', 'SIZE') > 0 )

raise_exception( 'Job Failed Check Rejection File');

 

note: VENKYBODS_TRG_dbo_Product_dim.txt is the file name which we mentioned in the target table error handling section.

 

Before Execute, Source and Target table data of Last_updated_Date.

 

73.png

 

Now Execute the job and we can see the Last_Updated_Dates.

72.png

 

Now try to generate any error to see the error log captured our error Handling.

 

try to implement the same and let me know if you need any further explanation on this.

 

Thanks

Venky


Viewing all articles
Browse latest Browse all 236

Trending Articles