Informatica Tutorial

Tuesday, December 6, 2011

Data Warehouse Architecture

The above Architecture is taken from the www.databaseanswers.com . We would recommend you to visit this site to get good understanding towards data modeling.

Now we are goanna define each and every terminology in the above picture to facilitate better understanding of the subject.

1. Operational Data Store : is a database designed to integrate data from multiple sources for additional operations on the data. The data is then passed back to operational systems for further operations and to the data warehouse for reporting.

2. ERP : Enterprise resource planning integrates internal and external management information across an entire organization, embracing finance/accounting, manufacturing, sales and service, etc.

Its purpose is to facilitate the flow of information between all business functions inside the boundaries of the organization and manage the connections to outside stakeholders.

3. CRM : Customer relationship management is a widely-implemented strategy for managing a company’s interactions with customers, clients and sales prospects. It involves using technology to organize, automate, and synchronize business processes—principally sales activities, but also those for marketing, customer service, and technical support.

Customer relationship management describes a company-wide business strategy including customer-interface departments as well as other departments.

4. Flat Files In data Ware Housing : Flat Files Doesn’t Maintain referential Integrity like RDBMS and are Usually seperated by some delimiters like comma and pipes etcs.

Right from Informatica 8.6 unstructured data sources like Ms-word,Email and Pdf can be taken as source.

5. ETL (Extract,Transform, And load) :

is a process in database usage and especially in data warehousing that involves:

• Extracting data from outside sources

• Transforming it to fit operational needs (which can include quality levels)

• Loading it into the end target (database or data warehouse)

6. Data Marts: A data mart (DM) is the access layer of the data warehouse (DW) environment that is used to get data out to the users. The DM is a subset of the DW, usually oriented to a specific business line or team.

For the Definition of the Data Warehouse Please Refer to Introduction to the Data ware Housing.

7. OLAP : OLAP (Online Analytical Processing) is a methodology to provide end users with access to large amounts of data in an intuitive and rapid manner to assist with deductions based on investigative reasoning.

OLAP systems need to:

1. Support the complex analysis requirements of decision-makers,

2. Analyze the data from a number of different perspectives (business dimensions), and

3. Support complex analyses against large input (atomic-level) data sets.

8. OLTP : Online transaction processing, or OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.

9. Data Mining: Is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Data mining is seen as an increasingly important tool by modern business to transform data into business intelligence giving an informational advantage.

Introduction to Data warehousing

A data warehouse (DW) is a database used for reporting. The data is offloaded from the operational systems for reporting. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.

A data warehouse maintains its functions in three layers: staging, integration, and access. Staging is used to store raw data for use by developers (analysis and support). The integration layer is used to integrate data and to have a level of abstraction from users. The access layer is for getting data out for users.

1. Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.

Definition as Per Ralph Kimball : A data warehouse is a copy of transaction data specifically structured for query and analysis.

His Approach towards towards the Data warehouse Design is Bottom-Up.In the bottom-up approach data marts are first created to provide reporting and analytical capabilities for specific business processes

2.Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.

Definition as Per Bill Inmon:

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.

Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject.

Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.

Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.

His Approach towards towards the Data warehouse Design is Top-Down. In top-down approach to data warehouse design, in which the data warehouse is designed using a normalized enterprise data model. "Atomic" data, that is, data at the lowest level of detail, are stored in the data warehouse.

Tuesday, November 22, 2011

New Tatkal ticket rules come into effect

New Delhi: Indian Railways’ new ticket booking rules under the Tatkal Sewa will come into force from Monday.

Under the new rule, advance reservation period of Tatkal tickets has been reduced from two days to one day (24 hours).

The advance Reservation Period of Tatkal Scheme is reduced from two days to one day excluding the day of journey from the originating station from Monday.

For example, if a train is to depart from the originating station on 2nd day of the month then the Tatkal quota for this particular train will open at 08.00hrs of the 1st day of the month.

A maximum of FOUR passengers can be booked per PNR for Tatkal tickets.Now, no refund will be granted on cancellation of confirmed Tatkal tickets except in exceptional circumstances. Also, no duplicate tickets will be issued except only in exceptional cases on payment of full fare including Tatkal charges.

Tatkal tickets shall be issued only on production of prescribed photo identity card. For this purpose, a self attested photo copy of the proof of identity of any one passenger is required to be attached to the requisition slip.

During the journey, the passenger whose identity card number has been indicated on the ticket will have to produce original proof of identity indicated on the ticket. If the passenger whose identity card number is indicated on the ticket is not traveling, all other passengers booked on that ticket will be treated as travelling without ticket and charged accordingly.

Agents/RTSAs will not be able to book Tatkal tickets at the counters as well as on the internet between 08.00hrs to 10.00hrs.The IRCTC agents will be permitted to book only one Tatkal ticket per train per day on the internet

Saturday, November 19, 2011

Why do we Use Shell Scripting

We use Shell Scripts to obtain the following purposes:

1. Customizing your work environment.

2. Automating Your Daily Task.

3.Automating Repetitive Task.

4.Executing Important Procedures like shutting down the system,formatting a disk,Creating a file system on it,mounting the file system,letting the users use the floppy and finally un mounting the disk.

5.Performing the same operation on many files.

When Not to use Shell Scripting :

1. Is too complex,such as writing an entire billing system.

2. Requires a high degree of efficiency

3. Requires a variety of Software tools.

Usage of Shell Scripts with in Informatica:

1.To run an batch via pmcmd.

2. To have header and footer in case we are goanna write to an flat file

3.To run the command task.

4. To update parameter file with session start time and end time

Friday, November 18, 2011

IDENTIFICATION OF BOTTLENECKS

Performance of Informatica is dependant on the performance of its several components like database, network, transformations, mappings, sessions etc. To tune the performance of Informatica, we have to identify the bottleneck first.

Bottleneck may be present in source, target, transformations, mapping, session,database or network. It is best to identify performance issue in components in the order source, target, transformations, mapping and session. After identifying the bottleneck, apply the tuning mechanisms in whichever way they are applicable to the project.

Identify bottleneck in Source:

If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of the mapping.In original case, without the test filter, total time taken is as follows:-

Total Time = time taken by (source + transformations + target load)

Now because of filter, Total Time = time taken by source

So if source was fine, then in the latter case, session should take less time. Still if the session takes near equal time as former case, then there is a source bottleneck.

Identify bottleneck in Target:

If the target is a relational table, then substitute it with a flat file and run the session. If the time taken now is very much less than the time taken for the session to load to table, then the target table is the bottleneck.

Identify bottleneck in Transformation:

Remove the transformation from the mapping and run it. Note the time taken.Then put the transformation back and run the mapping again. If the time taken now is significantly more than previous time, then the transformation is the bottleneck.

But removal of transformation for testing can be a pain for the developer since that might require further changes for the session to get into the ‘working mode’.

So we can put filter with the FALSE condition just after the transformation and run the session. If the session run takes equal time with and without this test filter,then transformation is the bottleneck.