Informatica Tutorial

Wednesday, November 16, 2011

WORKING WITH LINKS

Use links to connect each workflow task.
We can specify conditions with links to create branches in the workflow.
The Workflow Manager does not allow us to use links to create loops in the workflow. Each link in the workflow can run only once.

Valid Workflow :

Example of loop:

Specifying Link Conditions:

Once we create links between tasks, we can specify conditions for each link to determine the order of execution in the workflow.
If we do not specify conditions for each link, the Integration Service runs the next task in the workflow by default.
Use predefined or user-defined workflow variables in the link condition.

Steps:

In the Workflow Designer workspace, double-click the link you want to specify.
The Expression Editor appears.
In the Expression Editor, enter the link condition. The Expression Editor provides predefined workflow variables, user-defined workflow variables, variable functions, and Boolean and arithmetic operators.
Validate the expression using the Validate button.

Using the Expression Editor:

The Workflow Manager provides an Expression Editor for any expressions in the workflow. We can enter expressions using the Expression Editor for the following:

Link conditions
Decision task
Assignment task

PARTITIONING

A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier.
When the Integration Service runs the session, it can achieve higher Performance by partitioning the pipeline and performing the extract, Transformation, and load for each partition in parallel.

A partition is a pipeline stage that executes in a single reader, transformation, or Writer thread. The number of partitions in any pipeline stage equals the number of Threads in the stage. By default, the Integration Service creates one partition in every pipeline stage.

PARTITIONING ATTRIBUTES

1. Partition points

By default, IS sets partition points at various transformations in the pipeline.
Partition points mark thread boundaries and divide the pipeline into stages.
A stage is a section of a pipeline between any two partition points.

2. Number of Partitions

we can define up to 64 partitions at any partition point in a pipeline.
When we increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline.
increasing the number of partitions or partition points increases the number of threads.
The number of partitions we create equals the number of connections to the source or target. For one partition, one database connection will be used.

3. Partition types

The Integration Service creates a default partition type at each partition point.
If we have the Partitioning option, we can change the partition type. This option is purchased separately.
The partition type controls how the Integration Service distributes data among partitions at partition points.

PARTITIONING TYPES

1. Round Robin Partition Type

In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions.
Each partition processes approximately the same number of rows.
Use round-robin partitioning when we need to distribute rows evenly and do not need to group data among partitions.

2. Pass-Through Partition Type

In pass-through partitioning, the Integration Service processes data without Redistributing rows among partitions.
All rows in a single partition stay in that partition after crossing a pass-Through partition point.
Use pass-through partitioning when we want to increase data throughput, but we do not want to increase the number of partitions.

3. Database Partitioning Partition Type

Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only.
Use any number of pipeline partitions and any number of database partitions.
We can improve performance when the number of pipeline partitions equals the number of database partitions.

Database Partitioning with One Source

When we use database partitioning with a source qualifier with one source, the Integration Service generates SQL queries for each database partition and distributes the data from the database partitions among the session partitions Equally.

For example, when a session has three partitions and the database has five partitions, 1^st and 2^nd session partitions will receive data from 2 database partitions each. Thus four DB partitions used. 3^rd Session partition will receive Data from the remaining 1 DB partition.

Partitioning a Source Qualifier with Multiple Sources Tables

The Integration Service creates SQL queries for database partitions based on the Number of partitions in the database table with the most partitions.

If the session has three partitions and the database table has two partitions, one of the session partitions receives no data.

4. Hash Auto-Keys Partition Type

The Integration Service uses all grouped or sorted ports as a compound Partition key.
Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted Aggregator transformations to ensure that rows are grouped Properly before they enter these transformations.

5. Hash User-Keys Partition Type

The Integration Service uses a hash function to group rows of data among Partitions.
we define the number of ports to generate the partition key.
we choose the ports that define the partition key .

6. Key range Partition Type

We specify one or more ports to form a compound partition key.
The Integration Service passes data to each partition depending on the Ranges we specify for each port.
Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range.
Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define the range for each partition.

MAPPLETS

A mapplet is a reusable object that we create in the Mapplet Designer.
It contains a set of transformations and lets us reuse that transformation logic in multiple mappings.
Created in Mapplet Designer in Designer Tool.

We need to use same set of 5 transformations in say 10 mappings. So instead of making 5 transformations in every 10 mapping, we create a mapplet of these 5 transformations. Now we use this mapplet in all 10 mappings. Example: To create a surrogate key in target. We create a mapplet using a stored procedure to create Primary key for target table. We give target table name and key column name as input to mapplet and get the Surrogate key as output.

Mapplets help simplify mappings in the following ways:

Include source definitions: Use multiple source definitions and source qualifiers to provide source data for a mapping.
Accept data from sources in a mapping
Include multiple transformations: As many transformations as we need.
Pass data to multiple transformations: We can create a mapplet to feed data to multiple transformations. Each Output transformation in a mapplet represents one output group in a mapplet.
Contain unused ports: We do not have to connect all mapplet input and output ports in a mapping.

Mapplet Input:

Mapplet input can originate from a source definition and/or from an Input transformation in the mapplet. We can create multiple pipelines in a mapplet.

We use Mapplet Input transformation to give input to mapplet.
Use of Mapplet Input transformation is optional.

Mapplet Output:

The output of a mapplet is not connected to any target table.

We must use Mapplet Output transformation to store mapplet output.
A mapplet must contain at least one Output transformation with at least one connected port in the mapplet.

Example1: We will join EMP and DEPT table. Then calculate total salary. Give the output to mapplet out transformation.

· EMP and DEPT will be source tables.

· Output will be given to transformation Mapplet_Out.

Steps:

Open folder where we want to create the mapping.
Click Tools -> Mapplet Designer.
Click Mapplets-> Create-> Give name. Ex: mplt_example1
Drag EMP and DEPT table.
Use Joiner transformation as described earlier to join them.
Transformation -> Create -> Select Expression for list -> Create -> Done
Pass all ports from joiner to expression and then calculate total salary as described in expression transformation.
Now Transformation -> Create -> Select Mapplet Out from list –> Create -> Give name and then done.
Pass all ports from expression to Mapplet output.
Mapplet -> Validate
Repository -> Save

Use of mapplet in mapping:

We can mapplet in mapping by just dragging the mapplet from mapplet folder on left pane as we drag source and target tables.
When we use the mapplet in a mapping, the mapplet object displays only the ports from the Input and Output transformations. These are referred to as the mapplet input and mapplet output ports.
Make sure to give correct connection information in session.

Making a mapping: We will use mplt_example1, and then create a filter

transformation to filter records whose Total Salary is >= 1500.

· mplt_example1 will be source.

· Create target table same as Mapplet_out transformation as in picture above. Creating Mapping

Open folder where we want to create the mapping.
Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_mplt_example1
Drag mplt_Example1 and target table.
Transformation -> Create -> Select Filter for list -> Create -> Done.
Drag all ports from mplt_example1 to filter and give filter condition.
Connect all ports from filter to target. We can add more transformations after filter if needed.
Validate mapping and Save it.

Make session and workflow.
Give connection information for mapplet source tables.
Give connection information for target table.
Run workflow and see result.

MAPPING PARAMETERS & VARIABLES

Mapping parameters and variables represent values in mappings and mapplets.

When we use a mapping parameter or variable in a mapping, first we declare the mapping parameter or variable for use in each mapplet or mapping. Then, we define a value for the mapping parameter or variable before we run the session.

MAPPING PARAMETERS

A mapping parameter represents a constant value that we can define before running a session.
A mapping parameter retains the same value throughout the entire session.

Example: When we want to extract records of a particular month during ETL process, we will create a Mapping Parameter of data type and use it in query to compare it with the timestamp field in SQL override.

After we create a parameter, it appears in the Expression Editor.
We can then use the parameter in any expression in the mapplet or mapping.
We can also use parameters in a source qualifier filter, user-defined join, or extract override, and in the Expression Editor of reusable transformations.

MAPPING VARIABLES

Unlike mapping parameters, mapping variables are values that can change between sessions.
The Integration Service saves the latest value of a mapping variable to the repository at the end of each successful session.
We can override a saved value with the parameter file.
We can also clear all saved values for the session in the Workflow Manager.

We might use a mapping variable to perform an incremental read of the source. For example, we have a source table containing time stamped transactions and we want to evaluate the transactions on a daily basis. Instead of manually entering a session override to filter source data each time we run the session, we can create a mapping variable, $$IncludeDateTime. In the source qualifier, create a filter to read only rows whose transaction date equals $$IncludeDateTime, such as:

TIMESTAMP = $$IncludeDateTime

In the mapping, use a variable function to set the variable value to increment one day each time the session runs. If we set the initial value of $$IncludeDateTime to 8/1/2004, the first time the Integration Service runs the session, it reads only rows dated 8/1/2004. During the session, the Integration Service sets $$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the repository at the end of the session. The next time it runs the session, it reads only rows from August 2, 2004.

Used in following transformations:

Expression
Filter
Router
Update Strategy

Initial and Default Value:

When we declare a mapping parameter or variable in a mapping or a mapplet, we can enter an initial value. When the Integration Service needs an initial value, and we did not declare an initial value for the parameter or variable, the Integration Service uses a default value based on the data type of the parameter or variable.

Data ->Default Value

Numeric ->0

String ->Empty String

Date time ->1/1/1

Variable Values: Start value and current value of a mapping variable

Start Value:

The start value is the value of the variable at the start of the session. The Integration Service looks for the start value in the following order:

Value in parameter file
Value saved in the repository
Initial value
Default value

Current Value:

The current value is the value of the variable as the session progresses. When a session starts, the current value of a variable is the same as the start value. The final current value for a variable is saved to the repository at the end of a successful session. When a session fails to complete, the Integration Service does not update the value of the variable in the repository.

Note: If a variable function is not used to calculate the current value of a mapping variable, the start value of the variable is saved to the repository.

Variable Data type and Aggregation Type When we declare a mapping variable in a mapping, we need to configure the Data type and aggregation type for the variable. The IS uses the aggregate type of a Mapping variable to determine the final current value of the mapping variable.

Aggregation types are:

Count: Integer and small integer data types are valid only.
Max: All transformation data types except binary data type are valid.
Min: All transformation data types except binary data type are valid.

Variable Functions

Variable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline.

SetMaxVariable: Sets the variable to the maximum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Max.

SetMinVariable: Sets the variable to the minimum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Min.

SetCountVariable: Increments the variable value by one. It adds one to the variable value when a row is marked for insertion, and subtracts one when the row is Marked for deletion. It ignores rows marked for update or reject. Aggregation type set to Count.

SetVariable: Sets the variable to the configured value. At the end of a session, it compares the final current value of the variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final value to the repository.

Creating Mapping Parameters and Variables

Open the folder where we want to create parameter or variable.
In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the Mapplet Designer, click Mapplet > Parameters and Variables.
Click the add button.
Enter name. Do not remove $$ from name.
Select Type and Data type. Select Aggregation type for mapping variables.
Give Initial Value. Click ok.

Example: Use of Mapping of Mapping Parameters and Variables

EMP will be source table.
Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME, DEPTNO, TOTAL_SAL, MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR.
TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes every month)
SET_VAR: We will be added one month to the HIREDATE of every employee.
Create shortcuts as necessary.

Creating Mapping

Open folder where we want to create the mapping.
Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_mp_mv_example
Drag EMP and target table.
Transformation -> Create -> Select Expression for list -> Create –> Done.
Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression.
Create Parameter $$Bonus and Give initial value as 200.
Create variable $$var_max of MAX aggregation type and initial value 1500.
Create variable $$var_min of MIN aggregation type and initial value 1500.
Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT is visible when datatype is INT or SMALLINT.
Create variable $$var_set of MAX aggregation type.

12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR,

out_COUNT_VAR and out_SET_VAR.

13. Open expression editor for TOTAL_SAL. Do the same as we did earlier for SAL+ COMM. To add $$BONUS to it, select variable tab and select the parameter from mapping parameter. SAL + COMM + $$Bonus

14. Open Expression editor for out_max_var.

15. Select the variable function SETMAXVARIABLE from left side pane. Select

$$var_max from variable tab and SAL from ports tab as shown below.SETMAXVARIABLE($$var_max,SAL)

17. Open Expression editor for out_min_var and write the following expression:

SETMINVARIABLE($$var_min,SAL). Validate the expression.

18. Open Expression editor for out_count_var and write the following expression:

SETCOUNTVARIABLE($$var_count). Validate the expression.

19. Open Expression editor for out_set_var and write the following expression:

SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.

20. Click OK. Expression Transformation below:

21. Link all ports from expression to target and Validate Mapping and Save it.

22. See mapping picture on next page.

PARAMETER FILE

A parameter file is a list of parameters and associated values for a workflow, worklet, or session.
Parameter files provide flexibility to change these variables each time we run a workflow or session.
We can create multiple parameter files and change the file we use for a session or workflow. We can create a parameter file using a text editor such as WordPad or Notepad.
Enter the parameter file name and directory in the workflow or session properties.

A parameter file contains the following types of parameters and variables:

Workflow variable: References values and records information in a workflow.
Worklet variable: References values and records information in a worklet. Use predefined worklet variables in a parent workflow, but we cannot use workflow variables from the parent workflow in a worklet.
Session parameter: Defines a value that can change from session to session, such as a database connection or file name.
Mapping parameter and Mapping variable

USING A PARAMETER FILE

Parameter files contain several sections preceded by a heading. The heading identifies the Integration Service, Integration Service process, workflow, worklet, or session to which we want to assign parameters or variables.

Make session and workflow.
Give connection information for source and target table.
Run workflow and see result.

Sample Parameter File for Our example:

In the parameter file, folder and session names are case sensitive.

Create a text file in notepad with name Para_File.txt

[Practice.ST:s_m_MP_MV_Example]

$$Bonus=1000

$$var_max=500

$$var_min=1200

$$var_count=0

CONFIGURING PARAMTER FILE

We can specify the parameter file name and directory in the workflow or session properties.

To enter a parameter file in the workflow properties:

1. Open a Workflow in the Workflow Manager.

2. Click Workflows > Edit.

3. Click the Properties tab.

4. Enter the parameter directory and name in the Parameter Filename field.

5. Click OK.

To enter a parameter file in the session properties:

1. Open a session in the Workflow Manager.

2. Click the Properties tab and open the General Options settings.

3. Enter the parameter directory and name in the Parameter Filename field.

4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt