Informatica Tutorial: 11/16/11

Wednesday, November 16, 2011

WORKING WITH TASKS –Part 1

The Workflow Manager contains many types of tasks to help you build workflows and worklets. We can create reusable tasks in the Task Developer.

Types of tasks:

Task Type	Tool where task can be created	Reusable or not
Session	Task Developer	Yes
Email	Workflow Designer	Yes
Command	Worklet Designer	Yes
Event-Raise	Workflow Designer	No
Event-Wait	Worklet Designer	No
Timer		No
Decision		No
Assignment		No
Control		No

SESSION TASK

A session is a set of instructions that tells the Power Center Server how and when to move data from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the Session tasks sequentially or concurrently, depending on our needs.
The Power Center Server creates several files and in-memory caches depending on the transformations and options used in the session.

EMAIL TASK

The Workflow Manager provides an Email task that allows us to send email during a workflow.
Created by Administrator usually and we just drag and use it in our mapping.

Steps:

In the Task Developer or Workflow Designer, choose Tasks-Create.
Select an Email task and enter a name for the task. Click Create.
Click Done.
Double-click the Email task in the workspace. The Edit Tasks dialog box appears.
Click the Properties tab.
Enter the fully qualified email address of the mail recipient in the Email User Name field.
Enter the subject of the email in the Email Subject field. Or, you can leave this field blank.
Click the Open button in the Email Text field to open the Email Editor.
Click OK twice to save your changes.

Example: To send an email when a session completes:

Steps:

Create a workflow wf_sample_email
Drag any session task to workspace.
Edit Session task and go to Components tab.
See On Success Email Option there and configure it.
In Type select reusable or Non-reusable.
In Value, select the email task to be used.
Click Apply -> Ok.
Validate workflow and Repository -> Save

We can also drag the email task and use as per need.
We can set the option to send email on success or failure in components tab of a session task.

COMMAND TASK

The Command task allows us to specify one or more shell commands in UNIX or DOS commands in Windows to run during the workflow.

For example, we can specify shell commands in the Command task to delete reject files, copy a file, or archive target files.

Ways of using command task:

1. Standalone Command task: We can use a Command task anywhere in the workflow or worklet to run shell commands.

2. Pre- and post-session shell command: We can call a Command task as the pre- or post-session shell command for a Session task. This is done in COMPONENTS TAB of a session. We can run it in Pre-Session Command or Post Session Success Command or Post Session Failure Command. Select the Value and Type option as we did in Email task.

Example: to copy a file sample.txt from D drive to E.

Command: COPY D:\sample.txt E:\ in windows

Steps for creating command task:

In the Task Developer or Workflow Designer, choose Tasks-Create.
Select Command Task for the task type.
Enter a name for the Command task. Click Create. Then click done.
Double-click the Command task. Go to commands tab.
In the Commands tab, click the Add button to add a command.
In the Name field, enter a name for the new command.
In the Command field, click the Edit button to open the Command Editor.
Enter only one command in the Command Editor.
Click OK to close the Command Editor.
Repeat steps 5-9 to add more commands in the task.
Click OK.

Steps to create the workflow using command task:

Create a task using the above steps to copy a file in Task Developer.
Open Workflow Designer. Workflow -> Create -> Give name and click ok.
Start is displayed. Drag session say s_m_Filter_example and command task.
Link Start to Session task and Session to Command Task.
Double click link between Session and Command and give condition in editor as
$S_M_FILTER_EXAMPLE.Status=SUCCEEDED
Workflow-> Validate
Repository –> Save

WORKING WITH EVENT TASKS

We can define events in the workflow to specify the sequence of task execution.

Types of Events:

Pre-defined event: A pre-defined event is a file-watch event. This event Waits for a specified file to arrive at a given location.
User-defined event: A user-defined event is a sequence of tasks in the Workflow. We create events and then raise them as per need.

Steps for creating User Defined Event:

Open any workflow where we want to create an event.
Click Workflow-> Edit -> Events tab.
Click to Add button to add events and give the names as per need.
Click Apply -> Ok. Validate the workflow and Save it.

Types of Events Tasks:

EVENT RAISE: Event-Raise task represents a user-defined event. We use this task to raise a user defined event.
EVENT WAIT: Event-Wait task waits for a file watcher event or user defined event to occur before executing the next session in the workflow.

Example1: Use an event wait task and make sure that session s_filter_example runs when abc.txt file is present in D:\FILES folder.

Steps for creating workflow:

Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.
Task -> Create -> Select Event Wait. Give name. Click create and done.
Link Start to Event Wait task.
Drag s_filter_example to workspace and link it to event wait task.
Right click on event wait task and click EDIT -> EVENTS tab.
Select Pre Defined option there. In the blank space, give directory and filename to watch. Example: D:\FILES\abc.tct
Workflow validate and Repository Save.

Example 2: Raise a user defined event when session s_m_filter_example succeeds. Capture this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE

Steps for creating workflow:

Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.
Workflow -> Edit -> Events Tab and add events EVENT1 there.
Drag s_m_filter_example and link it to START task.
Click Tasks -> Create -> Select EVENT RAISE from list. Give name
ER_Example. Click Create and then done.Link ER_Example to s_m_filter_example.
Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined Event and Select EVENT1 from the list displayed. Apply -> OK.
Click link between ER_Example and s_m_filter_example and give the condition $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT. Click Create and then done.
Link EW_WAIT to START task.
Right click EW_WAIT -> EDIT-> EVENTS tab.
Select User Defined there. Select the Event1 by clicking Browse Events button.
Apply -> OK.
Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.
Mapping -> Validate
Repository -> Save.
Run workflow and see.

SCHEDULERS

We can schedule a workflow to run continuously, repeat at a given time or interval, or we can manually start a workflow. The Integration Service runs a scheduled workflow as configured.

By default, the workflow runs on demand. We can change the schedule settings by editing the scheduler. If we change schedule settings, the Integration Service reschedules the workflow according to the new settings.

A scheduler is a repository object that contains a set of schedule settings.
Scheduler can be non-reusable or reusable.
The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the workflow.
If we choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all workflows.
If we delete a folder, the Integration Service removes workflows from the schedule.
The Integration Service does not run the workflow if:
The prior workflow run fails.
We remove the workflow from the schedule
The Integration Service is running in safe mode

Creating a Reusable Scheduler

For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the same set of scheduling settings for workflows in the folder.
Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow.
When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the workflows valid, we must edit them and replace the missing scheduler.

Steps:

Open the folder where we want to create the scheduler.
In the Workflow Designer, click Workflows > Schedulers.
Click Add to add a new scheduler.
In the General tab, enter a name for the scheduler.
Configure the scheduler settings in the Scheduler tab.
Click Apply and OK.

Configuring Scheduler Settings

Configure the Schedule tab of the scheduler to set run options, schedule options, start options, and end options for the schedule.

There are 3 run options:

Run on Demand
Run Continuously
Run on Server initialization

1. Run on Demand:

Integration Service runs the workflow when we start the workflow manually.

2. Run Continuously:

Integration Service runs the workflow as soon as the service initializes. The Integration Service then starts the next run of the workflow as soon as it finishes the previous run.

3. Run on Server initialization

Integration Service runs the workflow as soon as the service is initialized. The Integration Service then starts the next run of the workflow according to settings in Schedule Options.

Schedule options for Run on Server initialization:

Run Once: To run the workflow just once.
Run every: Run the workflow at regular intervals, as configured.
Customized Repeat: Integration Service runs the workflow on the dates and times specified in the Repeat dialog box.

Start options for Run on Server initialization:

· Start Date

· Start Time

End options for Run on Server initialization:

End on: IS stops scheduling the workflow in the selected date.
End After: IS stops scheduling the workflow after the set number of
workflow runs.
Forever: IS schedules the workflow as long as the workflow does not fail.

Creating a Non-Reusable Scheduler

In the Workflow Designer, open the workflow.
Click Workflows > Edit.
In the Scheduler tab, choose Non-reusable. Select Reusable if we want to select an existing reusable scheduler for the workflow.
Note: If we do not have a reusable scheduler in the folder, we must
create one before we choose Reusable.
Click the right side of the Scheduler field to edit scheduling settings for the non- reusable scheduler
If we select Reusable, choose a reusable scheduler from the Scheduler
Browser dialog box.
Click Ok.

Points to Ponder :

To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose Unscheduled Workflow.
To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose Schedule Workflow.

WORKING WITH LINKS

Use links to connect each workflow task.
We can specify conditions with links to create branches in the workflow.
The Workflow Manager does not allow us to use links to create loops in the workflow. Each link in the workflow can run only once.

Valid Workflow :

Example of loop:

Specifying Link Conditions:

Once we create links between tasks, we can specify conditions for each link to determine the order of execution in the workflow.
If we do not specify conditions for each link, the Integration Service runs the next task in the workflow by default.
Use predefined or user-defined workflow variables in the link condition.

Steps:

In the Workflow Designer workspace, double-click the link you want to specify.
The Expression Editor appears.
In the Expression Editor, enter the link condition. The Expression Editor provides predefined workflow variables, user-defined workflow variables, variable functions, and Boolean and arithmetic operators.
Validate the expression using the Validate button.

Using the Expression Editor:

The Workflow Manager provides an Expression Editor for any expressions in the workflow. We can enter expressions using the Expression Editor for the following:

Link conditions
Decision task
Assignment task

PARTITIONING

A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier.
When the Integration Service runs the session, it can achieve higher Performance by partitioning the pipeline and performing the extract, Transformation, and load for each partition in parallel.

A partition is a pipeline stage that executes in a single reader, transformation, or Writer thread. The number of partitions in any pipeline stage equals the number of Threads in the stage. By default, the Integration Service creates one partition in every pipeline stage.

PARTITIONING ATTRIBUTES

1. Partition points

By default, IS sets partition points at various transformations in the pipeline.
Partition points mark thread boundaries and divide the pipeline into stages.
A stage is a section of a pipeline between any two partition points.

2. Number of Partitions

we can define up to 64 partitions at any partition point in a pipeline.
When we increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline.
increasing the number of partitions or partition points increases the number of threads.
The number of partitions we create equals the number of connections to the source or target. For one partition, one database connection will be used.

3. Partition types

The Integration Service creates a default partition type at each partition point.
If we have the Partitioning option, we can change the partition type. This option is purchased separately.
The partition type controls how the Integration Service distributes data among partitions at partition points.

PARTITIONING TYPES

1. Round Robin Partition Type

In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions.
Each partition processes approximately the same number of rows.
Use round-robin partitioning when we need to distribute rows evenly and do not need to group data among partitions.

2. Pass-Through Partition Type

In pass-through partitioning, the Integration Service processes data without Redistributing rows among partitions.
All rows in a single partition stay in that partition after crossing a pass-Through partition point.
Use pass-through partitioning when we want to increase data throughput, but we do not want to increase the number of partitions.

3. Database Partitioning Partition Type

Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only.
Use any number of pipeline partitions and any number of database partitions.
We can improve performance when the number of pipeline partitions equals the number of database partitions.

Database Partitioning with One Source

When we use database partitioning with a source qualifier with one source, the Integration Service generates SQL queries for each database partition and distributes the data from the database partitions among the session partitions Equally.

For example, when a session has three partitions and the database has five partitions, 1^st and 2^nd session partitions will receive data from 2 database partitions each. Thus four DB partitions used. 3^rd Session partition will receive Data from the remaining 1 DB partition.

Partitioning a Source Qualifier with Multiple Sources Tables

The Integration Service creates SQL queries for database partitions based on the Number of partitions in the database table with the most partitions.

If the session has three partitions and the database table has two partitions, one of the session partitions receives no data.

4. Hash Auto-Keys Partition Type

The Integration Service uses all grouped or sorted ports as a compound Partition key.
Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted Aggregator transformations to ensure that rows are grouped Properly before they enter these transformations.

5. Hash User-Keys Partition Type

The Integration Service uses a hash function to group rows of data among Partitions.
we define the number of ports to generate the partition key.
we choose the ports that define the partition key .

6. Key range Partition Type

We specify one or more ports to form a compound partition key.
The Integration Service passes data to each partition depending on the Ranges we specify for each port.
Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range.
Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define the range for each partition.

Pages

Wednesday, November 16, 2011

WORKING WITH TASKS –Part 1

SCHEDULERS

WORKING WITH LINKS

PARTITIONING