The Classic Data Lineage Diagram can be overly crowded in today's data lake architectures where it is common to find tables/files with over hundred columns/fields. Furthermore, the large number of tables/files involved may generate too many objects in a readable graph, giving rise to possible warning in the user interface.
Please refer to the diagram visualization common features.
In addition to those general features, additionally there features specific to the classic diagram presentation.
This method of analysis presents a graphical representation of the flow of data through connection definitions to data stores and physical transformation rules which transform and move the data. To see data flow lineage, one must
o Define a configuration that contains all of the models potentially in the data flow
o Stitch the models together by resolving connection definitions and Build the configuration
Once the configuration is ready, then you are ready to report on lineage.
End-to-end data flow lineage across models is only available at the classifier (e.g., table) and feature (e.g., column) level. If instead, one goes to the object page for a schema or model, , as this is not classifier or feature, the data flow tab shows the overview lineage within the scope of that model only.
This is an older methodology for presenting a lineage trace. You are highly encouraged to us the newer Data flow Diagram method as the Classic diagram does not scale well with larger diagrams and number of objects.
You may disable this feature in the UI by setting the Show Lineage Classic Diagram in group preference to false for the group Everyone.
Data Lineage (sources)
These are the analysis type use cases, generally posed as questions such as:
o Given an item on a report, what data entry system fields impact these results?
o Why are the numbers on this report the way that they are?
o How to change the system data to get the correct results for this report?
This type of analysis, i.e., asking where the information comes from, is a question posed “upstream” in the dataflow. We refer to it as a reverse lineage question. When consumers of these reports ask these questions, a correct and responsive answer may be the most valuable information provided by a metadata management environment.
Steps
2. Click the Diagram tab on the left.
3. From here you may
o Pick the Type in the pull-down in the upper right.
- Data Impact type
- Data Lineage type
- Full Data Lineage type for both data impact and lineage.
o Click the More Options icon and
- select Show/Hide Columns to show columns in the selected object, or all objects if none is selected
- select Expand/Collapse All to expand down to the current display level (columns or tables) or collapse to the highest level.
o Click Save an image to produce a downloadable file with a lineage image
o Click Edit Filters and specify lineage filter options.
o Click Display Options and specify lineage display options.
Example
Search for the Net Vendor Customer Invoices Tableau worksheet and open it.
Go to the Data Flow tab.
This is a business intelligence report and thus is at the end of the lineage, so MetaKarta automatically chooses Data Lineage for lineage Type.
The End Objects tab on the left is selected in this case, so we see the textual tree-based report.
Click Collapse all to reduce the tree to the top five elements in the lineage.
Now, click the Diagram tab on the left. Click the Collapse Selected node completely () icon.
The different lineage indicate different types of data flow processes
Click the plus sign next to MITI-Finance-AP.dbo (Database) in Accounting (Model).
Click the plus sign next to Invoice (Table) in MITI-Finance-AP.dbo (Database).
You then see the exact column that is a source in the lineage trace.
Click in an empty space in the diagram to de-select Invoice, then select the To Column level expansion, which will now apply to all objects.
Select a column, then click Highlight to outline the paths through that object.
Click the black line between Adjustments.Adj.TransAmt and Staging DW.dbo.GLAccount. AccountAmountAvailable.
And you see the transformation at the bottom of the page.
You may also simply pass the pointer over a link and see summary information.
Data Impact
Many times, one may ask these forward lineage or impact analysis type of questions:
o If I make a change to this field, what reports will be impacted?
o How is this identity information merged with the personnel system information on these other reports?
A data flow impact report traces the manner in which data flows from source to destination.
Steps
5. Click Data Impact in the Type pull-down in the upper right.
6. From here you may
o Pick the Type in the pull-down in the upper right.
- Data Impact type
- Data Lineage type
- Full Data Lineage type for both data impact and lineage.
o Click the More Options icon and
- select Show/Hide Columns to show columns in the selected object, or all objects if none is selected
- select Expand/Collapse All to expand down to the current display level (columns or tables) or collapse to the highest level.
o Click Save an image to produce a downloadable file with a lineage image
o Click Edit Filters and specify lineage filter options.
o Click Display Options and specify lineage display options.
Example
Navigate to the object page for the file PAYTRANS.csv (a search string must be enclosed in quotation marks as the period (.) has special meaning in the search syntax, e.g. "PAYTRANS.csv") and the semantic search must be disabled.
Then click the Data Flow tab and Diagram tab on the left. Note, the Impact type is automatically selected, as the PAYTRANS.csv file is an ultimate source in the configuration, so it does not have any source lineage.
Full Data Lineage
This option provides the combination of both:
o Data Lineage (trace from an object upstream to objects that provide data flow to that object)
o Data Impact (trace from an object downstream to objects that are impacted via data flow by that object)
Based upon all the lineage flows that trace though the selected object (feature or classifier).
Steps
2. Click Full Data Lineage in the Type pull-down in the upper right.
3. From here you may
o Pick the Type in the pull-down in the upper right.
- Data Impact type
- Data Lineage type
- Full Data Lineage type for both data impact and lineage.
o Click the More Options icon and
- select Show/Hide Columns to show columns in the selected object, or all objects if none is selected
- select Expand/Collapse All to expand down to the current display level (columns or tables) or collapse to the highest level.
o Click Save an image to produce a downloadable file with a lineage image
o Click Edit Filters and specify lineage filter options.
o Click Display Options and specify lineage display options.
Example
The Full Data Lineage option is the default. However, as it may take more time to render, you may disable it in the Group Preferences.
If disable, you may enable it. Sign in as Administrator. Go to MANAGE > Groups. Select the group named Everyone. Go to the Preferences tab and click Add and specify the Enable Full Data Lineage preference.
Click OK. Set the Value to true and click SAVE.
Search for “Customer” and pick the table Dimensional DW > dbo > Customer.
Go to the Data Flow tab.
The Data flow tab has double arrows next to it, indicating that there are both impact and lineage traces for this object.
Select Full Data Lineage.
You have all the lineage traces going through that object. The object from which the lineage is determined is marked with a red pin.
Classic Diagram Visualization Common Features
There are a number of common features and tools available when visualizing a lineage trace, data model, etc.
Classic Diagram Show Overview
You may click this Show overview icon to show or hide an Overview panel of the model diagram. Click in the overview to quickly move to a portion of the full diagram.
Classic Diagram Zoom In/Out and Fit to content
Click Zoom in () or Zoom out () icons to adjust the aspect ratio of the diagram. Also, you may click on the Fit to content ()icon to view the entire diagram at the best zoom that will fit.
Classic Diagram Collapse / Expand
Click Expand / Collapse to expand or collapse the entire diagram (ensure that you do not have an object selected, otherwise the action will only apply to that object).
You may also click on the plus sign for an object to expand and the minus sign to collapse just that object.
Show actions for the selected object
Show the actions available in the context menu for the selected object.
o Open to go to the object page for the object
o Open Lineage to change the point of origin (red pin in diagram) to present a new lineage display.
o Expand the object showing all columns/attributes/fields/feature (can be slow) to change show feature level for the selected objects.
o Collapse Completely to only show the highest level object (e.g., models) for the selected objects.
o Focus on Path to show lineage between the selected object and the point of origin (red pin in diagram) to present a new lineage display.
o Highlight path to highlight the lineage flow from the selected objects.
o Summarize this Object to remove the selected objects as summarized into lineage lines and present a new lineage display.
Classic Diagram Trace in General
Select the Analysis Diagram tab on the left to obtain this presentation. You will see a graphical presentation of the lineage (data impact or data source).
Open the object page
You may right-click and select Open (),to navigate to the object page.
You may download a PNG or SVG image of the diagram.
Quick find
In the upper right, there is a search text box that will provide a quick list of object names that contain the text you type. You may click on any of the results to select that object in the diagram and moving the focus there.
Interpreting the graphical lineage
In general, the lineage tools within MetaKarta function identically whether one is analyzing data flow lineage, semantic lineage or both. However, the presentation is different, as follows:
In addition, MetaKarta has four levels of presentation:
o Configuration Model Connections Overview – which is a diagram representing the various Models contained within a configuration and how they are related (or stitched) to each other based upon connection definitions manually assigned to MetaKarta.
o Model Connections Overview – which is a diagram representing the various Models contained within the directory of an external repository and how they are related (or stitched) to each other based upon connection definitions already provided in the external metadata repository.
o Model Lineage Overview – which is a diagram representing an overview of the lineage within a given Model.
o Lineage Trace analysis at the configuration or Model level – which is a fully detailed trace of semantic and/or data flow lineage for detailed analysis.
Properties Panel
Click to select a object and view its properties in the Properties Panel on the right. You may show and hide this panel as needed.
The Display Options are available.
o MAXIMUM NODE WIDTH – set the size of the object boxes.
o Highlight Control Links – Include control lineage links in the Highlight operation
Classic Diagram Options Maximum Node Width
In many cases, names of objects may be too long to fit into the objects in the diagram. You may specify several different node width maximums to make the diagram more readable. Click on Display Options.
Pick the note width.
Click highlight path to highlight the lineage path of the selected object. Double click or long click to enable auto highlight on any selected object.