Informatica Tutorial

Saturday, November 19, 2011

Why do we Use Shell Scripting

We use Shell Scripts to obtain the following purposes:

1. Customizing your work environment.

2. Automating Your Daily Task.

3.Automating Repetitive Task.

4.Executing Important Procedures like shutting down the system,formatting a disk,Creating a file system on it,mounting the file system,letting the users use the floppy and finally un mounting the disk.

5.Performing the same operation on many files.

When Not to use Shell Scripting :

1. Is too complex,such as writing an entire billing system.

2. Requires a high degree of efficiency

3. Requires a variety of Software tools.

Usage of Shell Scripts with in Informatica:

1.To run an batch via pmcmd.

2. To have header and footer in case we are goanna write to an flat file

3.To run the command task.

4. To update parameter file with session start time and end time

Friday, November 18, 2011

IDENTIFICATION OF BOTTLENECKS

Performance of Informatica is dependant on the performance of its several components like database, network, transformations, mappings, sessions etc. To tune the performance of Informatica, we have to identify the bottleneck first.

Bottleneck may be present in source, target, transformations, mapping, session,database or network. It is best to identify performance issue in components in the order source, target, transformations, mapping and session. After identifying the bottleneck, apply the tuning mechanisms in whichever way they are applicable to the project.

Identify bottleneck in Source:

If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of the mapping.In original case, without the test filter, total time taken is as follows:-

Total Time = time taken by (source + transformations + target load)

Now because of filter, Total Time = time taken by source

So if source was fine, then in the latter case, session should take less time. Still if the session takes near equal time as former case, then there is a source bottleneck.

Identify bottleneck in Target:

If the target is a relational table, then substitute it with a flat file and run the session. If the time taken now is very much less than the time taken for the session to load to table, then the target table is the bottleneck.

Identify bottleneck in Transformation:

Remove the transformation from the mapping and run it. Note the time taken.Then put the transformation back and run the mapping again. If the time taken now is significantly more than previous time, then the transformation is the bottleneck.

But removal of transformation for testing can be a pain for the developer since that might require further changes for the session to get into the ‘working mode’.

So we can put filter with the FALSE condition just after the transformation and run the session. If the session run takes equal time with and without this test filter,then transformation is the bottleneck.

Necessity Of Performance-Tuning

Performance Tuning Overview

The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune session performance, first identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance.

If you tune all the bottlenecks, you can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.

Because determining the best way to improve performance can be complex, change one variable at a time, and time the session both before and after the change. If session performance does not improve, you might want to return to the original configuration.

Complete the following tasks to improve session performance:

PUSH DOWN OPTIMISATION

You can push transformation logic to the source or target database using pushdown optimization. When you run a session configured for pushdown optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The source or target database executes the SQL queries to process the transformations.

The amount of transformation logic you can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a database.

Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization.

The following figure shows a mapping containing transformation logic that can be pushed to the source database:

This mapping contains an Expression transformation that creates an item ID based on the store number 5419 and the item ID from the source. To push the transformation logic to the database, the Integration Service generates the following SQL statement:

PERFORMANCE TUNING OF LOOKUP TRANSFORMATIONS

Lookup transformations are used to lookup a set of values in another table.Lookups slows down the performance.

1. To improve performance, cache the lookup tables. Informatica can cache all the lookup and reference tables; this makes operations run very fast. (Meaning of cache is given in point 2 of this section and the procedure for determining the optimum cache size is given at the end of this document.)

2. Even after caching, the performance can be further improved by minimizing the size of the lookup cache. Reduce the number of cached rows by using a sql override with a restriction.

Cache: Cache stores data in memory so that Informatica does not have to read the table each time it is referenced. This reduces the time taken by the process to a large extent. Cache is automatically generated by Informatica depending on the marked lookup ports or by a user defined sql query.

Example for caching by a user defined query: -

Suppose we need to lookup records where employee_id=eno.

‘employee_id’ is from the lookup table, EMPLOYEE_TABLE and ‘eno’ is the

input that comes from the from the source table, SUPPORT_TABLE.

We put the following sql query override in Lookup Transform

‘select employee_id from EMPLOYEE_TABLE’

If there are 50,000 employee_id, then size of the lookup cache will be 50,000.

Instead of the above query, we put the following:-

‘select emp employee_id from EMPLOYEE_TABLE e, SUPPORT_TABLE s

where e. employee_id=s.eno’

If there are 1000 eno, then the size of the lookup cache will be only 1000.But here the performance gain will happen only if the number of records in SUPPORT_TABLE is not huge. Our concern is to make the size of the cache as less as possible.