Informatica Interview questions and answers
Informatica Interview Questions and Answers
Informatica Interview Questions and Answers for beginners and experts. List of frequently asked Informatica Interview Questions with answers by Besant Technologies.
We hope these Informatica interview questions and answers are useful and will help you to get the best job in the networking industry. This Informatica interview questions and answers are prepared by Informatica Professionals based on MNC Companies expectation. Stay tuned we will update New Informatica Interview questions with Answers Frequently. If you want to learn Practical Informatica Training then please go through this Informatica Training in Chennai
Besant Technologies supports the students by providing Informatica interview questions and answers for the job placements and job purposes. Informatica is the leading important course in the present situation because more job openings and the high salary pay for this Informatica and more related jobs. We provide the Informatica online training also for all students around the world through the Gangboard medium. These are top Informatica interview questions and answers, prepared by our institute experienced trainers.
Best Informatica Interview Questions and Answers
Here is the list of most frequently asked Informatica Interview Questions and Answers in technical interviews. These Informatica questions and answers are suitable for both freshers and experienced professionals at any level. The Informatica questions are for intermediate to somewhat advanced Informatica professionals, but even if you are just a beginner or fresher you should be able to understand the Informatica answers and explanations here we give.
In this post, you will get the most important and top Informatica Interview Questions and Answers, which will be very helpful and useful to those who are preparing for jobs.
ETL is a process of Extract, Transform and Load the data into DataWarehousing in the required format for decision-making.
- IBM Datastage
- Informatica PowerCenter
- Talend Studio etc.
Informatica is an ETL tool provided by ‘Informatica Corporation’ which is used to Extract, Transform and Load all type of databases.
Informatica PowerCenter had 2 types of Tools:
Server Tools: Administrator Console, Integration Service
Client Tools: Repository Manager, Designer, Workflow Manager, Workflow Monitor
It is used to organize and secure metadata by creating folders.
The designer is used to create source definitions, target definitions, and transformations to build the mappings. Also, it is used to create mapplets which can be reused in different mappings.
It is used to store workflow metadata and connection object information in the repository. A wf contains sessions and different tasks according to the requirement. Tasks can include session, email notification, assignment task, decision task etc. Each task in a wf needs to be connected via links and also we can provide link task based on the requirements.
It is used to retrieve wf run status and session logs. A wf is a set of instructions that tells an integration service how to run the tasks. Basically, WF monitor is used to monitoring workflows and tasks. Q8). Define Mapping?
It is a set of Source and Target definitions linked by transformation objects that define the rules for transformation.
It is a set of instructions that describe how and when to move the data from source to target.
Its nothing but a repository object that generates, modifies or passes data.
Active and Passive Transformation?
An active transformation can change the number and position of rows that pass through it. For Eg. Aggregator, Rank, Filter etc.
A passive transformation doesn’t change the number of rows that pass through it. For Eg. Expression, Sequence Generator etc.
Connected and Unconnected Transformation?
A connected transformation is connected to source definition or target definition or other transformation in a mapping
An unconnected transformation is not connected to any of the Informatica objects in a mapping.
Source Qualifier Transformation
An SQ is an active and connected transformation that reads the rows from a relational database or flat file source while running a session.
- SQ can be configured to joindata originating from same source database.
- We can specify the ofsorted ports and the integration service adds an ‘Order By’ clause to the default SQ query.
- We can use source filter to specify a certain condition.
- Select Distinct option can be used to only select unique records.
- Preand Post SQL statements can be executed before and after the SQ query.
- Also, we can write User Defined SQL query to override the default query in the SQ.
Normally, we use a Source filter to reduce the number of source records. If we include the string WHERE in the source filter, the Integration Service fails the session. In the above case, the correct syntax will be EMP_ID > 10.
The expression is a Passive and connected transformation which is used to calculate values in a single row. Also, it can be used to test conditional statements before writing the output results to target tables or other transformations.
For eg. we might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers or vice versa etc.
By using Date Conversion Function
IIF( IS_DATE( Column1 ) = 1, TO_DATE( Column1 , ‘YYYY-MM-DD’ ), NULL )
The sorter is an Active and Connected transformation which is used to sort data in ascending or descending order according to specified sort keys.
In sorter T/R we can select the “distinct” option in the sorted property. When Sorter transformation is configured to treat output rows as distinct, it assigns all ports as part of the sort key. The Integration Service discards duplicate rows compared during the sort operation. Since the number of Input Rows will vary as compared to the Output rows, it is treated as an Active transformation.
Since in a Flat File the distinct option in the source qualifier will be disabled, hence we can use a Sorter Transformation and check the Distinct option to get the unique rows. When we select the distinct option all the columns will the selected as keys, in ascending order by default.
An aggregator is an Active and Connected transformation which is used to perform aggregate calculations like AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, and VARIANCE.
- Conditional Clauses: The conditional clause can be any clause that evaluates TRUE or FALSE.
For eg. SUM (SALARY, DEPT_ID = 10)
- Non-Aggregate expressions.
- Nested Aggregation Expression: One aggregate function within another aggregate function.
- MAX (COUNT (EMP))
- Use Sorted input which reduces the amount of data cached and improves session performance.
- Filter the unnecessary data before aggregating it.
- connecting only the necessary input/output ports to subsequent transformations in order to reduce the size of a data cache.
- If the input data is not sorted correctly, the session will fail.
- In some cases, if the input data is properly sorted, the session may fail also if the sort order by ports and the group by ports of the aggregator are not in the same order.
By using the Group By checkbox on the ports having duplicate occurring data.
A Filter transformation is an Active and Connected transformation which is used to test the data based on a certain condition that passes through it. If the condition is satisfied then the data will be passed to the next transformation or target else it will be dropped.
|SQ Source Filter||Filter Transformation|
|Source Qualifier transformation filters rows while reading from the source.||Filter transformation filters rows from within a mapping|
|Limits the records extracted from a source||Limits the records sent to a target|
|It can only filter rows from relational sources||Filters rows coming from any type of source system|
|The filter condition in the Source Qualifier transformation only uses standard SQL||Filter Transformation can define a condition using any statement or transformation function that returns either a TRUE or FALSE|
A router is an active and connected transformation which is used to test the source data based on multiple conditions.
- Input Group
- Output Group/User Defined Group
- Default Group
|Router transformation divides the incoming records into multiple groups based on some condition.||Filter transformation restricts or blocks the incoming recordset based on a single condition|
|Records which does not match any condition passes to the default group||Records which doesn’t match the condition is dropped|
|Filter acts like WHERE condition in SQL||Router acts like CASE WHEN statement in SQL|
A Joiner is an Active and Connected transformation used to join two data sources coming from same or heterogeneous databases or files.
- Normal: Discards all rows of data from the master and detail source that do not match the join condition.
- Master Outer: keeps all rows of data from the detail source and the matching rows from the master source.
- Detail Outer: keeps all rows of data from the master source and the matching rows from the detail source.
- Full Outer: keeps all rows of data from both the master and detail sources.
During a session run, the Integration Service compares each row of the master source against the detail source.
When the Integration Service processes a Joiner transformation, it caches the rows from the master source and blocks the detail source. Once it finishes reading and caching all master rows, it unblocks the detail source and reads the detail rows. That’s why if we will select the source containing fewer input rows in master, the cache size will be smaller, hence performance will be improved.
Rank is an Active and Connected transformation used to select a set of top or bottom values of data.
Rank port is an input/output port used to specify the column for which we want to rank the source values. RANK INDEX is the default port created by informatica for each Rank transformation which stores the ranking position for each row in a group.
We can configure one of its input/output ports as a group by a port. For each unique value in the group port, the transformation creates a group of rows falling within the rank definition (top or bottom, and a particular number in each rank).
Sequence Generator Transformation
A Sequence Generator is a Passive and Connected transformation that generates numeric values. It is used to create unique primary key values or surrogate key which uniquely identifies each record.
- Start Value: Start value of the generated sequence that we want the Integration Service to use if we use the Cycle option. If we select Cycle, the Integration Service cycles back to this value when it reaches the end value. The default is 0.
- Increment By: Difference between two consecutive values from the NEXTVAL port. The default is 1.
- End Value: Maximum value generated by Sequence Generator. After reaching this value the session will fail if the sequence generator is not configured to cycle.
- Current Value: Current value of the sequence. Enter the value we want the Integration Service to use as the first value in the sequence. The default is 1
- Cycle:If selected, when the Integration Service reaches the configured end value for the sequence, it wraps around and starts the cycle again, beginning with the configured Start Value.
- A number of Cached Values:Number of sequential values the Integration Service caches at a time. The default value for a standard Sequence Generator is 0. The default value for a reusable Sequence Generator is 1,000.
- Reset: Restarts the sequence at the current value each time a session runs. This option is disabled for reusable Sequence Generator transformations.
- When we connect the NEXTVAL output port of the Sequence Generator directly to the surrogate key columns of the target tables, a block of sequence numbers is sent to one target tables surrogate key column. The second target receives a block of sequence numbers from the Sequence Generator transformation only after the first target table receives the block of sequence numbers.
- Suppose we have 5 rows coming from the source, so TGT1 will have the sequence values as (1,2,3,4,5) and TGT2 (6,7,8,9,10).
- Now, suppose we need to have the same surrogate keys in both the targets.
- Then we need to put an Expression transformation in between the Sequence Generator and the Target tables. The Sequence Generator will pass unique values to the expression transformation, and then the rows are routed from the expression transformation to the targets.
End Value is the maximum value the Sequence Generator will generate. After it reaches the End value the session fails with the following error message:
- TT_11009 Sequence Generator Transformation: Overflow error.
Failing of session can be handled if the Sequence Generator is configured to Cycle.
Stored Procedure Transformation
Stored Procedure is a Passive transformation used to execute stored procedures prebuilt on the database through Informatica. It can also be used to call functions to return calculated values. It can be Connectedor Unconnected.
- Check the status of a target database before loading data into it.
- Determine if enough space exists in a database.
- Perform a specialized calculation.
- Drop and recreate indexes.
PROC_RESULT is a system variable, where the output of an unconnected stored procedure transformation is assigned by default.
The status code provides error handling for the Informatica. The stored procedure issues a status code that notifies whether or not stored procedure completed successfully. This value can not seen by the user. Its only used by informatica to determine whether to continue running the session or stop.
A lookup is a connected and unconnected transformation which is used to perform below tasks:
- Get related value
- Perform a calculation
- Update slowly changing dimension tables.
|Connected Lookup||Unconnected Lookup|
|Connected lookup participates in data flow and receives input directly from the pipeline||Unconnected lookup receives input values from the result of a LKP: expression in another transformation|
|Connected lookup can use both dynamic and static cache||Unconnected Lookup cache can NOT be dynamic|
|Connected lookup can return more than one column value ( output port )||Unconnected Lookup can return only one column value i.e. output port|
|Connected lookup caches all lookup columns||Unconnected lookup caches only the lookup output ports in the lookup conditions and the return port|
|Supports user-defined default values (i.e. value to return when lookup conditions are not satisfied)||Does not support user-defined default values|
- Persistent cache: We can save the lookup cache files and reuse them the next time the Informatica server processes a lookup transformation configured to use the cache.
- Recache from the database: If the persistent cache is not synchronized with the lookup table, we can configure the Lookup transformation to rebuild the lookup cache.
- Static cache: We can configure a static or read-only cache for the only lookup table. By default, Informatica server creates a static cache. It caches the lookup table and lookup values in the cache for each row that comes into the transformation. When the lookup condition is true, the Informatica server does not update the cache while it processes the lookup transformation.
- Dynamic cache: If we want to cache the target table and insert new rows into the cache and the target, we can create a lookup transformation to use a dynamic cache. The Informatica server dynamically inserts data to the target table.
- Shared cache: We can share the lookup cache between multiple transactions. We can share unnamed cache between transformations in the same mapping.
That is through Using Dynamic Lookup Cache of the target table and associating the input ports with the lookup port and checking the Insert Else Update option will help to eliminate the duplicate records in source and hence loading unique records in the target.
The normalizer transformation normalizes the records from COBOL and relational sources, allowing us to organize the data according to our own needs. Apart from that, it is used to convert multiple occurring columns in a row into multiple rows.
A union is an Active and Connected transformation used to merge data from multiple pipelines or sources into one pipeline branch. Similar to the UNION ALL SQL statement, the Union transformation does not remove duplicate rows.
- All input groups and the output group must have matching ports. The precision, data type, and scale must be identical across all groups.
- We can create multiple input groups, but only one default output group.
- The Union transformation does not remove duplicate rows.
- We cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.
The union is an active transformation because it combines two or more data streams into one. Though the total number of rows passing into the Union is the same as the total number of rows passing out of it, and the sequence of rows from any given input stream is preserved in the output, the positions of the rows are not preserved, i.e. row number 1 from input stream 1 might not be row number 1 in the output stream. Union does not even guarantee that the output is repeatable.
Update Strategy transform
Update strategy T/R is used to maintain the history data or just most recent changes into target table.
A target table can also be updated without using “Update Strategy”. For this, we need to define the key in the target table in Informatica level and then we need to connect the key and the field we want to update in the mapping Target. In the session level, we should set the target property as “Update as Update” and enable the “Update” check-box.
Let’s assume we have a target table “Customer” with fields as “Customer ID”, “Customer Name” and “Customer Address”. Suppose we want to update “Customer Address” without an Update Strategy. Then we have to define “Customer ID” as primary key in Informatica level and we will have to connect Customer ID and Customer Address fields in the mapping. If the session properties are set correctly as described above, then the mapping will only update the customer address field for all matching customer IDs.
The integration service follows the instructions coded in mapping to flag the rows for insert, update, delete or reject.
Treat input rows as Data Driven: This is the default session property option selected while using an Update Strategy transformation in a mapping.
If in Session anything other than DATA DRIVEN is mentioned then Update strategy in the mapping is ignored.
Suppose we want to perform an insert or update target table in a single pipeline. Then we can write the below expression in update strategy transformation to insert or update based on the incoming row.
IIF (LKP_EMPLOYEE_ID IS NULL, DD_INSERT, DD_UPDATE)
Slowly Changing Dimension
The dimension that changes over time is known as slowly changing dimensions.
Type 1: Slowly Changing Dimension
- In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.
- In our example, recall we originally have the following table:
- After Charu moved from Hyderabad to Bengaluru, the new information replaces the new record, and we have the following table:
- This is the easiest way to handle the Slowly Changing Dimension problem since there is no need to keep track of the old information.
- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Charu lived in Hyderabad before.
Type 2:Slowly Changing Dimension
- In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.
- In our example, recall we originally have the following table:
- After Charu moved from Hyderabad to Bengaluru, we add the new information as a new row into the table:
- This allows us to accurately keep all historical information.
- This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
- This necessarily complicates the ETL process.
Type 3: Slowly Changing Dimension
- In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.
- In our example, recall we originally have the following table:
To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:
- Customer Key
- Original State
- Current State
- Effective Date
After Charu moved from Hyderabad to Bengaluru, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2018):
|Customer Key||Name||Original State||Current State||Effective Date|
- This does not increase the size of the table since new information is updated.
- This allows us to keep some part of history.
Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Charu later moves to Chennai on April 30, 2018, the Bengaluru information will be lost.
- Allocation of the system shared a memory
- Execution of Pre-session command/Post-session command
- Session Initialization
- Creation of SQL commands for reader/writer threads
- Cache creation information
- Start/End timings for target loading
- Error encountered during the session
- Load summary of Reader/Writer/DTM Statistics
- Load Manager Process
- DTM Process
- Manages session and batch scheduling.
- Locks the session and read session properties.
- Reads the parameter file.
- Expand the server and session variables and parameters.
- Verify permissions and privileges.
- Validate source and target code pages.
- Create the session log file.
- Create the Data Transformation Manager which executes the session.
After the load manager performs validations for the session, it creates the DTM process. The DTM process is the second process associated with the session run. The primary purpose of the DTM process is to create and manage threads that carry out the session tasks.
- MASTER THREAD – the Main thread of the DTM process. Creates and manages all other threads.
- MAPPING THREAD – One Thread to Each Session. Fetches Session and Mapping Information.
- Pre and Post Session Thread – One Thread Each To Perform Pre and Post Session Operations.
- READER THREAD – One Thread for Each Partition for Each Source Pipeline.
- WRITER THREAD – One Thread for Each Partition If Target Exist in the Source pipeline Write to the Target.
- TRANSFORMATION THREAD – One or More Transformation Thread For Each Partition.
- Create an Event: right-click the workflow and Edit->events tab->create an event
- Place event raise task in the workflow space and give the created event name in the “user defined event” (Properties tab)
- Place event wait task in the workflow space->there are two options in events tab,
Pre-defined: This is a file watch
User-defined: This is used for the event created on the workflow properties. When the Eventraise is executed, it creates the event, it triggers the event wait to continue.
We can specify the period of time to wait before the integration service runs the next task in the workflow with the timer task.
- Absolute Time: Give the exact time when to start the next task or refer a date-time variable.
- Relative Time: give the hours, minute, seconds – From the start time of the task
— From the start time of the parent workflow/worklet
— From the start time of the top-level workflow
Decision task is used instead of multiple link conditions in a workflow. Instead of specifying multiple link conditions, use the predefined condition variable in a decision task to simplify link conditions.
We can assign a value to user-defined workflow variable with the assigned task.
We can use the control task to stop, abort or fail the top-level workflow or the parent workflow based on input link condition.
Suppose we have ten source flat files of the same structure. How can we load all the files in target database in a single batch run using a single mapping?
To load a set of source files we need to create a file say final.txt containing the source flat file names, ten files in our case and set the Source filetype option as Indirect.
- Mapplet consists of a set of transformations that are reusable. A reusable transformation is a single transformation that can be reusable.
- If u create variables or parameters in mapplet that cannot be used in another mapping or mapplet. Unlike the variables that r created in a reusable transformation can be useful in any other mapping or mapplet.
- We can not include source definitions in reusable transformations. But we can add sources to a mapplet.
- Whole transformation logic will be hidden in case of mapplet. But it is transparent in case of reusable transformation.
We can’t use COBOL source qualifier, joiner, normalizer transformations in mapplet. Whereas we can make them as a reusable transformation.
In Informatica, Mapping parameter represents a constant value that we can define in parameter file before running a session. Informatica mapping parameter retains the same value throughout the entire session.
Unlike a mapping parameter, Informatica mapping variable represents a value that can change throughout the session. The integration service saves the value of mapping variable to the repository at the end of session run and uses that value next time we run the session.
We can specify the target load order based on source qualifiers in a mapping. If we have multiple source qualifiers connected to the multiple targets, we can designate the order in which Informatica server loads data into the targets.
if we have one Source Qualifier transformation that provides data for multiple target tables having primary and foreign key relationships, we will go for Constraint-based load ordering.
In the Workflow Manager, we can Configure Constraint-based load ordering for a session. The Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to the foreign key table.
When we issue the STOP command on the executing session task, the Integration Service stops reading data from source. It continues processing, writing and committing the data to targets. If the Integration Service cannot finish processing and committing data, we can issue the abort command.
In contrast, ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session.
(1) Round Robin Partitioning: The integration service distributes data evenly among all partitions. Its used where we want each partitions to process approximately the same number of rows.
(2) Hash Partitioning: The integration service applies a hash function to a partition key to group data among the partitions. Use hash autokey partitioning at or before Rank, Sorter, Joiner, and Unsorted Aggregator transformations to ensure that rows are grouped Properly before they enter these transformations. In hash user-key, we define the number of ports to generate the partition key.
(3) PassThrough Partition: In passthrough partitioning, the Integration Service processes data without redistributing rows among partitions. Use passthrough partitioning when we want to increase data throughput, but we do not want to increase the number of partitions.
(4) Key range Partition: We specify one or more ports to form a compound partition key. The Integration Service passes data to each partition depending on the Ranges we specify for each port. Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range. For Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define the range for each partition.
We can push transformation logic to the source or target database using pushdown optimization. When we run a session configured for pushdown optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The source or target database executes the SQL queries to process the transformations.