Classic Diagram

 

The Classic Data Lineage Diagram can be overly crowded in today's data lake architectures where it is common to find tables/files with over hundred columns/fields. Furthermore, the large number of tables/files involved may generate too many objects in a readable graph, giving rise to possible warning in the user interface.

Please refer to the diagram visualization common features.

In addition to those general features, additionally there features specific to the classic diagram presentation.

This method of analysis presents a graphical representation of the flow of data through connection definitions to data stores and physical transformation rules which transform and move the data. To see data flow lineage, one must

Define a configuration that contains all of the models potentially in the data flow

Stitch the models together by resolving connection definitions and Build the configuration

Once the configuration is ready, then you are ready to report on lineage.

End-to-end data flow lineage across models is only available at the classifier (e.g., table) and feature (e.g., column) level.  If instead, one goes to the object page for a schema or model, , as this is not classifier or feature, the data flow tab shows the overview lineage within the scope of that model only.

 

This is an older methodology for presenting a lineage trace.  You are highly encouraged to us the newer Data flow Diagram method as the Classic diagram does not scale well with larger diagrams and number of objects.

 

You may disable this feature in the UI by setting the Show Lineage Classic Diagram in group preference to false for the group Everyone.

 

Data Lineage (sources)

These are the analysis type use cases, generally posed as questions such as:

Given an item on a report, what data entry system fields impact these results?

Why are the numbers on this report the way that they are?

How to change the system data to get the correct results for this report?

This type of analysis, i.e., asking where the information comes from, is a question posed “upstream” in the dataflow. We refer to it as a reverse lineage question. When consumers of these reports ask these questions, a correct and responsive answer may be the most valuable information provided by a metadata management environment.

Steps

1.  Trace data flow lineage.

2.  Click the Diagram tab on the left.

3.  From here you may

Pick the Type in the pull-down in the upper right.

-       Data Impact type

-       Data Lineage type

-       Full Data Lineage type for both data impact and lineage.

Click the More Options icon and

-       select Show/Hide Columns to show columns in the selected object, or all objects if none is selected

-       select Expand/Collapse All to expand down to the current display level (columns or tables) or collapse to the highest level.

Click Save an image to produce a downloadable file with a lineage image

Click Edit Filters and specify lineage filter options.

Click Display Options and specify lineage display options.

Example

Search for the Net Vendor Customer Invoices Tableau worksheet and open it.

 

A screenshot of a computer

Description automatically generated

 

Go to the Data Flow tab.

 

A screenshot of a computer

Description automatically generated

 

This is a business intelligence report and thus is at the end of the lineage, so MetaKarta automatically chooses Data Lineage for lineage Type.

The End Objects tab on the left is selected in this case, so we see the textual tree-based report.

Click Collapse all to reduce the tree to the top five elements in the lineage.

 

A screenshot of a computer

Description automatically generated

 

Now, click the Diagram tab on the left. Click the Collapse Selected node completely () icon.

 

A screenshot of a computer

Description automatically generated

 

The different lineage indicate different types of data flow processes

 

Click the plus sign next to MITI-Finance-AP.dbo (Database) in Accounting (Model).

Click the plus sign next to Invoice (Table) in MITI-Finance-AP.dbo (Database).

 

A screenshot of a computer

Description automatically generated

 

You then see the exact column that is a source in the lineage trace.

 

Click in an empty space in the diagram to de-select Invoice, then select the To Column level expansion, which will now apply to all objects.

 

A screenshot of a computer screen

Description automatically generated

 

Select a column, then click Highlight to outline the paths through that object.

 

A screenshot of a computer

Description automatically generated

 

Click the black line between Adjustments.Adj.TransAmt and Staging DW.dbo.GLAccount. AccountAmountAvailable.

 

A screenshot of a computer

Description automatically generated

 

And you see the transformation at the bottom of the page.

You may also simply pass the pointer over a link and see summary information.

Data Impact

 

 

Many times, one may ask these forward lineage or impact analysis type of questions:

If I make a change to this field, what reports will be impacted?

How is this identity information merged with the personnel system information on these other reports?

A data flow impact report traces the manner in which data flows from source to destination.

Steps

4.  Trace data flow lineage.

5.  Click Data Impact in the Type pull-down in the upper right.

6.  From here you may

Pick the Type in the pull-down in the upper right.

-       Data Impact type

-       Data Lineage type

-       Full Data Lineage type for both data impact and lineage.

Click the More Options icon and

-       select Show/Hide Columns to show columns in the selected object, or all objects if none is selected

-       select Expand/Collapse All to expand down to the current display level (columns or tables) or collapse to the highest level.

Click Save an image to produce a downloadable file with a lineage image

Click Edit Filters and specify lineage filter options.

Click Display Options and specify lineage display options.

Example

Navigate to the object page for the file PAYTRANS.csv (a search string must be enclosed in quotation marks as the period (.) has special meaning in the search syntax, e.g. "PAYTRANS.csv") and the semantic search must be disabled.

 

A screenshot of a computer

Description automatically generated

 

A screenshot of a computer

Description automatically generated

 

Then click the Data Flow tab and Diagram tab on the left. Note, the Impact type is automatically selected, as the PAYTRANS.csv file is an ultimate source in the configuration, so it does not have any source lineage.

 

A screenshot of a computer

Description automatically generated

 

Full Data Lineage

This option provides the combination of both:

Data Lineage (trace from an object upstream to objects that provide data flow to that object)

Data Impact (trace from an object downstream to objects that are impacted via data flow by that object)

Based upon all the lineage flows that trace though the selected object (feature or classifier).

Steps

1.  Trace data flow lineage.

2.  Click Full Data Lineage in the Type pull-down in the upper right.

3.  From here you may

Pick the Type in the pull-down in the upper right.

-       Data Impact type

-       Data Lineage type

-       Full Data Lineage type for both data impact and lineage.

Click the More Options icon and

-       select Show/Hide Columns to show columns in the selected object, or all objects if none is selected

-       select Expand/Collapse All to expand down to the current display level (columns or tables) or collapse to the highest level.

Click Save an image to produce a downloadable file with a lineage image

Click Edit Filters and specify lineage filter options.

Click Display Options and specify lineage display options.

Example

The Full Data Lineage option is the default.  However, as it may take more time to render, you may disable it in the Group Preferences.

If disable, you may enable it.  Sign in as Administrator.  Go to MANAGE > Groups.  Select the group named Everyone.  Go to the Preferences tab and click Add and specify the Enable Full Data Lineage preference.

 

A screenshot of a computer

Description automatically generated

 

Click OK.  Set the Value to true and click SAVE.

 

A screenshot of a computer

Description automatically generated

 

Search for “Customer” and pick the table Dimensional DW > dbo > Customer.

 

A screenshot of a computer

Description automatically generated

 

Go to the Data Flow tab.

The Data flow tab has double arrows next to it, indicating that there are both impact and lineage traces for this object.

Select Full Data Lineage.

 

A screenshot of a computer

Description automatically generated

 

You have all the lineage traces going through that object.  The object from which the lineage is determined is marked with a red pin.

Classic Diagram Visualization Common Features

There are a number of common features and tools available when visualizing a lineage trace, data model, etc.

Classic Diagram Show Overview

You may click this Show overview icon to show or hide an Overview panel of the model diagram. Click in the overview to quickly move to a portion of the full diagram.

 

Classic Diagram Zoom In/Out and Fit to content

Click Zoom in () or Zoom out () icons to adjust the aspect ratio of the diagram. Also, you may click on the Fit to content ()icon to view the entire diagram at the best zoom that will fit.

Classic Diagram Collapse / Expand

 

 

Click Expand / Collapse to expand or collapse the entire diagram (ensure that you do not have an object selected, otherwise the action will only apply to that object).

You may also click on the plus sign for an object to expand and the minus sign to collapse just that object.

Show actions for the selected object

Show the actions available in the context menu for the selected object.

 

Open to go to the object page for the object

Open Lineage to change the point of origin (red pin in diagram) to present a new lineage display.

Expand the object showing all columns/attributes/fields/feature (can be slow) to change show feature level for the selected objects.

Collapse Completely to only show the highest level object (e.g., models) for the selected objects.

Focus on Path to show lineage between the selected object and the point of origin (red pin in diagram) to present a new lineage display.

Highlight path to highlight the lineage flow from the selected objects.

Summarize this Object to remove the selected objects as summarized into lineage lines and present a new lineage display.

 

Classic Diagram Trace in General

Select the Analysis Diagram tab on the left to obtain this presentation. You will see a graphical presentation of the lineage (data impact or data source).

Open the object page

 

You may right-click and select Open (),to navigate to the object page.

Print

 

 

You may download a PNG or SVG image of the diagram.

Quick find

 

 

In the upper right, there is a search text box that will provide a quick list of object names that contain the text you type. You may click on any of the results to select that object in the diagram and moving the focus there.

 Interpreting the graphical lineage

In general, the lineage tools within MetaKarta function identically whether one is analyzing data flow lineage, semantic lineage or both. However, the presentation is different, as follows:        

 

 

In addition, MetaKarta has four levels of presentation:

Configuration Model Connections Overview – which is a diagram representing the various Models contained within a configuration and how they are related (or stitched) to each other based upon connection definitions manually assigned to MetaKarta.

Model Connections Overview – which is a diagram representing the various Models contained within the directory of an external repository and how they are related (or stitched) to each other based upon connection definitions already provided in the external metadata repository.

Model Lineage Overview – which is a diagram representing an overview of the lineage within a given Model.

Lineage Trace analysis at the configuration or Model level – which is a fully detailed trace of semantic and/or data flow lineage for detailed analysis.

Properties Panel

Click to select a object and view its properties in the Properties Panel on the right. You may show and hide this panel as needed.

 

Classic Diagram Display Options

The  Display Options are available.

 

 

MAXIMUM NODE WIDTH – set the size of the object boxes.

Highlight Control Links – Include control lineage links in the Highlight operation

Classic Diagram Options Maximum Node Width

In many cases, names of objects may be too long to fit into the objects in the diagram. You may specify several different node width maximums to make the diagram more readable. Click on Display Options.

 

A screenshot of a computer

Description automatically generated

 

Pick the note width.

 

 

 

Highlight Path

Click highlight path to highlight the lineage path of the selected object.  Double click or long click to enable auto highlight on any selected object.