This method of analysis presents either graphical or textual representation of the flow of data through connection definitions to data stores and physical transformation rules which transform and move the data. In order to see data flow lineage, one must
o Define a configuration that contains all of the models potentially in the data flow
o Stitch the models together by resolving connection definitions and Build the configuration
Once the configuration is ready, then you are ready to report on lineage.
In the Data Lineage Diagram, all columns/fields of a given table/file are presented at once which matches the classic data modeling concepts. Selection of a given column/field allows a user to highlight the data flow to it.
End-to-end data flow lineage across models is only available at the classifier (e.g., table) and feature (e.g., column) level. If instead one goes to the object page for a schema or model, as this is not classifier or feature, the data flow tab shows the overview lineage within the scope of that model only.
A data flow lineage trace presents summary lineage as opposed to the data flow overview lineage which presents a step by step transformation lineage.
When you trace impact/lineage of a table or column, you do not see all the transformations. Instead, you see a summary of the whole job (you get a picture much closer to the one for an architecture diagram). But, you are also able to see complete end-to-end lineage (not just confined to one DI or BI model).
Finally, the tool does not display constants on the lineage diagram. In particular, this means that if a constant appears as a source for lineage and that process only has that constant as a source for a lineage trace, you will not see that process in the lineage trace.
Steps
1. Sign in as a user which has at least the Metadata Viewing or Data Management capability object role assignment to the configuration and all its contained models.
Without the Metadata Viewing capability object role assignment to all the configuration’s contained model, you will see a dialog indicating that you do not have sufficient privileges.
2. Find a starting point for lineage by either
o Navigating to that element’s object page and select the Data Flow tab
o Or, for lists of elements, click the line the element is on and click the appropriate Trace Data Flow icon
o Or, right click on the element in a diagram (architecture diagram, lineage diagram or model diagram) and select Trace Lineage > Data Flow
3. From here you may
o Use the common lineage trace functions
o Specify the lineage presentation
o Switch to
- Data Lineage - trace from an object upstream to objects that provide data flow to that object
- Data Impact - trace from an object downstream to objects that are impacted via data flow by that object
- Full Data Lineage - Both of the above.
Select the Tree tab on the left to obtain this presentation.
Next to SHOW, you will see a list of objects or processes:
o Objects data store object types, e.g., tables, columns, views, fields, files, etc.
o Processes data movement and possibly transformation processes, e.g. mappings, transformations, computation, select/inserts, etc.
The scope of that list is based upon the choice of direction of the trace which are impact (forward) or lineage(sources) or the business intelligence (BI) reports, as well as the proximity in the trace:
o Adjacent objects/processes in the lineage which are the next items in a lineage trace. For impact, that can often be the data store (like a warehouse) that is the target of an object being loaded by DI/ETL that is the focus of the lineage. For course lineage, it can often mean the data source directly loaded from to produce the object that is the focus of the lineage.
o Ultimate end objects/processes are the final nodes in the lineage where the trace stops. For impact, this often means report fields, for source lineage it often means operational system tables and columns.
o All objects/projects in the lineage which are part of the business intelligence type reports generally at the far end of the lineage trace.
Steps
2. Click the Tree tab on the left.
3. From here you may
o Pick the options next to SHOW in the upper left, as defined above.
o Click the Download icon to download the entire textual results to CSV format.
o Expand the details panel to see an equivalent of the Overview tab for the object page of a selected object or process.
Example
Data Flow Tree Objects
Search for the DW Staging.Customer table, go to the object page and then the Data Flow tab. Click the Tree tab on the left. Click Objects and Ultimate next to SHOW.
The Lineage (Sources) panel shows the Customer table in the Accounting.MITI-Finance-AR datastore along with the two files in the Data Lake, which together comprise the ultimate sources for this Customer table in Staging DW.
The Impact (Destinations) panel shows the ultimate reports using data from the Customer table.
Click Adjacent.
The Lineage (Sources) panel still shows the Customer table in the Accounts Receivable model as it was not only the ultimate source for this table in Staging DW, but also was the adjacent one.
The Impact (Destinations) panel shows the tables in the Dimensional DW data store, instead of going to the ultimate destination, which were the reports.
Now, click the Diagram tab on the right to see the full picture of the lineage.
Now, one can see that why the similar results on the Lineage (Sources) panel as there is really only one step (adjacent) to the ultimate sources.
This example is a fairly simple demo. One can imagine the value of using the Tree tab for more realistic (and then much more complex) lineage examples from real environments.
Return to the Tree tab and click Ultimate.
Expand the Details panel on the far right and select the Finance1 app in the Qlik Sense Cloud model.
Now we see a representation of the contents of the Overview tab of the object page, but presented as a panel in the lineage display.
You may now click on the Open in Tool as in the examples with BI tools further in the user guide.
Data Flow Tree Processes
Now click Processes.
There are four processes that are immediately before in the data flow and one process immediately after.
Click the first item in the Processes (Sources) list, which is named Mapping.
This precursor process is actually Talend DI process reading from the accounts receivable operational data store and writing to the Staging DB (for which we were looking at the lineage).
Go to the Data Flow tab for this process to produce an overview lineage diagram for that process:
This process includes a number of parallel pipelines to various tables in Staging DW, including the Customer table.
As it is a data flow overview diagram (not a lineage trace), there are several pipelines shown, but the scope is just within the DI/ETL model.
Click the Back arrow in the browser to return to the original Tree based lineage trace.
Explore Further
Invoking a lineage trace from any reference to a object
You may invoke a lineage trace from any diagram or any list of results (e.g., from a Browse or Search), either via right-click context menu
Interpreting the graphical lineage
In general, the lineage tools within MetaKarta function identically whether one is analyzing data flow lineage, semantic lineage or both. However, the presentation is different, as follows:
In addition, MetaKarta has four levels of presentation:
o Configuration Model Connections Overview – which is a diagram representing the various Models contained within a configuration and how they are related (or stitched) to each other based upon connection definitions manually assigned to MetaKarta.
o Model Connections Overview – which is a diagram representing the various Models contained within the directory of an external repository and how they are related (or stitched) to each other based upon connection definitions already provided in the external metadata repository.
o Model Lineage Overview – which is a diagram representing an overview of the lineage within a given Model.
o Lineage Trace analysis at the configuration or Model level – which is a fully detailed trace of semantic and/or data flow lineage for detailed analysis.
Properties Panel
Click to select a object and view its properties in the Properties Panel on the right. You may show and hide this panel as needed.