o BRAND NEW USER INTERFACE EXPERIENCE:
- METADATA MANAGER VS METADATA EXPLORER UI:
· In previous versions, the Web Application Server has offered two different User Interfaces (UI) targeting different user communities. The original Metadata Manager UI was designed for the advanced technical users with a traditional development tool layout including multiple panels: tree structure on the left, multi-tab windows in the middle, attributes on the right, and log activities at the bottom. The Metadata Manager UI also presents the highest level of details and complexity of all harvested metadata. The Metadata Explorer UI was initially introduced as a read only UI with simpler metadata for business users offering an easy to use layout for multiple devices, including tablets. The Metadata Explorer became the new UI platform for all new editing capabilities such as the business glossary or data modeling.
· With MetaKarta, all other editing capabilities are now available in the Metadata Explorer UI, including data mapping, enterprise data architectures (Configuration editor and model stitching), and even the Administration features like Custom Attributes which are now under are now available in the Metadata Explorer UI > MANAGE > Custom Attributes. Consequently, the Metadata Manager UI is now only necessary (and therefore available) in the MetaKarta Advanced Editions for repository management (with multi version and configuration management). The MetaKarta Standard Edition v10.0 is now fully implemented in the Metadata Explorer UI where MANAGE > Repository allows users to directly create models to the default single configuration, import metadata, stitch models (connections), and trace lineage right away.
- METADATA HOME PAGES:
New metadata home pages with multiple top tabs offer quick access to all key information:
· The first tab is always the Overview tab which provides a dashboard to all critical information and properties.
· The next set of tabs are specific (metamodel / profile driven) to the type of object, for example:
§ Database Table objects have tabs for Columns and Constraints.
§ BI Reports (like Tableau Workbook) objects have tabs for Dashboards, Worksheets, and Data sources.
· The next set of tabs are for the common critical metadata analysis:
§ DATA FLOW for data lineage and impact analysis.
§ RELATIONSHIPS for detection, management, and curation of relationships (see new features below)
§ SEMANTIC FLOW for definition and usage perspectives (see new features below)
· The last set of tabs are for common documentation and administration like: Comments, Attachments and Audit Log.
-
METADATA QUICK ACCESS:
Much improved ways to quickly access the right
metadata:
· SEARCH has been massively improved in both real time performance (now based on Lucene) and in functionality as a metadata driven search with natural language semantic search (see new features below)
· BROWSE has also been massively improved in both performance (now also Lucene based) and in functionalities as a metadata asset type driven browser with support for hierarchical display at all levels of any data sources including database, DI, BI, Data Lakes, and even No SQL (like JSON hierarchal structures)
· Enterprise ARCHITECTURE driven graphical navigation allows users to drill down from a top down big picture of the enterprise architecture.
-
METADATA REPORTING:
Brand new powerful unified metadata reporting
capabilities where both search and browse end up to the same reporting page
which is also directly available at Browse > Report. Starting from search
simply predefines the text filtering (e.g. customer), while browsing predefines
a category (e.g. database / tables), and direct access to reporting does not
predefine anything.
· The reporting capabilities offers to select multiple categories (e.g. database / tables + Flat files) and subset by content (My Data lake + Sales DW database) before drilling down with the following filters:
· Then filtering is available for Last Modified, Stewards, Labels, Semantic Types, Endorsed By, Certified By, Created By, Warning By, and Commented By.
· Finally, more custom filtering per attribute (including custom attributes) is common to the metadata subset (e.g. SecurityLevel = Orange).
· Reports can be reused by saving the URL as favorites (further versions will support full report management within the application)
-
METADATA USER LISTS:
Brand new user list management feature allows
users to define and mange lists of metadata objects. Just like labels, lists are
available anywhere in the UI to add/remove objects, bulk editing, and
management. Lists can contain any type of metadata such my favorite list of
terms, tables, or reports. Lists can also contain multiple type of content such
as my to do list with terms, tables, and reports in that list. Lists can be
shared with other users when marked as public, such as our quarterly review
list. Note that lists are flat, therefore not hierarchical and with no sub-list
or include concepts.
-
METADATA TAGGING WITH LABELS:
The metadata tagging with labels has
been much improved to be harmonized with the brand-new list management
experience in order to facilitate adding/removing objects anywhere, grid
editing, and more.
-
METADATA DOCUMENTATION:
Much improved ways to document metadata:
· MULTI-LINE TEXT has been introduced (in addition to the previous single line Text for better formatting and layout. In addition, Multi-Line text has been enhanced with support for URL links and embedded image attachments using a JIRA like syntax. Multi-Line Text is not only the default format for all Descriptions and Comments but is now also available as a new type of Custom Attribute that can be applied to any metadata for documentation.
· RICH TEXT Documentation with (WYSIWYG) Visual Edition is not only the default medium for Glossary Term documentation but is now also available as a new type of Custom Attribute that can be applied to any metadata for documentation.
· SQL TEXT of SQL View, Stored Procedures and more are now better presented with colored syntax and optional reformatting. Note that this is not a new type of custom attribute but any predefined attribute with SQL is better formatted.
· ATTACHMENTS (such as pictures, documents, etc.) have been enhanced as part of its integration with the new Metadata Explorer, including Management (Drag and Drop), Preview, and Thumbnails that can be embedded in the Text (and Multi-Line text) descriptions, comments and custom attributes.
o DATA MODELING AND DOCUMENTING ANY HARVESTED METADATA
- In MIMM v9.1, existing data stores such as RDBMS could be harvested as a Physical Data Model (PDM) instead of a simple Model, in order to offer full documentation including business glossary term reuse based upon automatic semantic links, reverse engineering based upon naming standards, data modeling with diagramming, and of course automatic change management (re-harvest/compare/merge).
- In MIMM v10.0, all the above capabilities are now available on any harvestable model content without having to create a PDM. In other words, any data integration, business intelligence, reports, data stores (relational, hierarchical, NoSQL, files, etc.) can be documented at as needed, including support for relational data models. Consequently, all existing PDM in MIMM v9.1 may be converted to Models in MIMM v10.0 without loss of any existing documentation (including diagrams).
- The documentation (business names and definitions) process has been improved allowing any object (e.g. table, column, report field) to be quickly and easily:
· "Classified" with a local semantic link to a glossary term, without having to use an intermediate Semantic Mapping content, or associating the Model to a Glossary as with the PDM.
· "Documented" with a local business name and definition overwriting any Semantic link (Classified, Mapped or Inferred)
Furthermore, this documentation process is also dramatically enhanced through the integration with a new "Semantic Flow" tab acting as an interactive dashboard on finding the right definitions (see below).
- When harvested databases that are already documented in data modeling tools (e.g. Erwin), such data models can be imported as a separate model and automatically stitched directly to its matching harvested database (without using any semantic mapping model). The semantic stitching is automatically maintained as both the database and its associated data model are independently re-imported/refreshed on regular basis (the stitching will report inconsistencies). From the user perspective, the documentation (business name, descriptions, relationships, diagrams) of any harvested database table / column is automatically inherited from its associated data model.
- Note that MetaKarta Advanced Edition with Authoring also allows one to use a PDM to create data models from scratch (e.g. design new HIVE table requirements) without pointing to a live database rather than simply documenting existing data stores. The PDM concept is retained for this purpose. Note that a Conceptual/Logical Data Modeling capability may also be added in future versions.
o DATA CATALOGING
- Brand new Data Cataloging applications well integrated with the existing Data Governance (DG) capabilities, and based upon the solid Metadata Management (MM) foundations with full data linage and powerful metadata version and configuration management.
- Managing both modern cloud based data lakes and classic Data Warehouse (DW) Enterprise Architectures.
- Harvesting metadata from both modern (XML, JSON, Avro, Parquet, ORC) files, Hive tables, and Kafaka messages), and classic (relational tables / CSV files) data technologies.
- Advanced data driven metadata discovery through integrated powerful Data Profiling and Reference Data capabilities.
- Presenting a brand-new business driven Data Cataloging User Interface (UI) experience.
- Integrating forward engineering with self-service Data Preparation, Data Quality, Data Integration, and Business Intelligence design tools.
o DATA SAMPLING, PROFILING, & SECURITY
- New "Data Viewer" security role allows authorized users (or groups) to sample data (read only access).
- New "Data Manager" security role allows authorized users (or groups) to enable full data profiling of selected objects (tables, files, etc) on demand.
- New data security protection allows Data Managers to set a "Hide Data" property at the column level (e.g. SSN column). In addition, data can also be automatically hidden when detected at the Semantic Type level (see new feature below) where a newly harvested set of files in a data lake may contain sensitive data (e.g. of Semantic Type SSN).
- New Sample Data tab on the home page of any data store object (e.g. file, table) allows users to view sample data to better define its metadata.
- New Data Profiling Statistics are displayed (including graphical reports) on the Overview tab (dashboard) in the home page of any data store object (e.g. file, table) as well as in the properties grid right panel and within the grid view.
o SEMANTIC TYPES Discovery & Management
- Semantic Discovery (semantic types, patterns/lists machine learning)
o RELATIONSHIPS Discovery & Management
- Relationship Discovery using the following methods:
· Automatically "Inferred" based on:
§ Metadata Usage Driven: using the surrounding data flow usage such as joins in DI (ETL Tools, SQL Scripts or Data Prep) and BI (traditional or self-service) activities.
· On Demand "Detected" based on:
§ Metadata Name Matching: for example PurchaseOrder.SKU = Product.SKU or Customer.AccountId = Account.Id)
§ Semantic Definition Matching: classified by users to the same glossary term.
§ Semantic Type Matching: discovered through data profiling (e.g. SSN syntax or VIN number)
- Relationship Management with user defined relationships and social curation (e.g. endorsed or certified joins for active data governance generation into a DI or BI design tools)
- Dynamic Data Model diagram generation from Relationships surrounding any object (e.g. table or file).
o SOCIAL CURATION
- Endorsement, warnings, certifications with impact on search ranking.
o SEMANTIC SEARCH
- Metadata driven search language such as "Customer tables" for any tables with Customer in the name, "tables with SSN" for any table with a SSN column (e.g. for GDPR), or "ROI in Reports" for any reports containing ROI.
o SEMANTIC MAPPING
- Major improvements in semantic mapping including in place semantic mapping via two approaches:
· Top-down from Business Glossary Term or Data Model Entity/Attribute,
· Bottom-up from Data Store Tables/Columns or Report Fields.
o SEMANTIC FLOW
- Major improvements on the semantic flow analysis now also supporting the documentation process acting as an interactive dashboard for finding definitions that are:
· "Local" (within the Model) that has been either "Imported" (metadata harvesting) or locally "Documented" (edited description overwrite),
· locally "Classified" (within the Model) to an external glossary term,
· directly "Mapped" via a Semantic Mapping Model or direct stitching (e.g. between a database and its data model),
· indirectly "Inferred" through complex data flow pass through and semantic flow (which can be graphically analyzed in the data flow diagram), or
· "Searched" for by name in all glossaries.
Any of the Searched, Inferred or Mapped definitions may quickly (in place) be reused/promoted as a Classified or Mapped definition.
o RELATED REPORTS
- Related Reports are now available on any metadata objects such as files, tables, columns, etc. (instead of just business terms). This allows business users looking at the result of a search to have direct access to a simple list of any related reports in any BI tools (crossing all semantic and data flows without exposing any of the complexity to business users) Such reports can then automatically be opened in their respective BI tool technologies, therefore acting as a multi-vendor BI tool web portal for business users.
o DATA CONNECTIONS / METADATA STITCHING
- Complete support for file format harvesting and stitching.
- Connection pool factorization (e.g. from DI and BI servers) to minimize the number and complexity of stitching connections.
o DATA MAPPING Specifications & Design
- The Data Mapping Specifications and the Data Mapping Designs have been fully resigned and merged into brand new Data Mappings that can be used for multiple purposes, including capturing data flow mapping requirements, and even all the way to developing a full data mapping design that may be forward engineered (see Active Data Governance) into SQL Scripts or DI/ETL tool jobs (e.g. Informatica PowerCenter, Talend DI)
- The Data Mapping tool allows for the mapping of multiple source data stores into a target data store in multiple steps with (schema or table level) Bulk Mappings and (column/field level) Query Mappings. The data mapping tool offers new graphical mapping visualization as you map, and new expression syntactical editors when designing joins, lookups, filters, etc.
o ACTIVE DATA GOVERNANCE
- From Physical Data Models (PDM) or Models harvested from Data Modeling tools or Relational Databases, with forward engineering:
· to any supported Data Modeling tools (e.g.Erwin),
· to any supported Data Integration (DI/ETL) tools (e.g. Talend DI) as source/target models,
· to any supported BI Design Tools (e.g. Tableau, SAP BusinessObjects, IBM Cognos).
- From Data Mappings specifications and design, with forward engineering into:
· to any supported BI Design Tools (e.g. Talend DI)
o ARCHITECTURE, DEPLOYMENT & INTEGRATION
- Search engine re-designed and optimized with Apache Lucene offering near real time metadata search and navigation (and removing any dependencies of underlying database text search requirements)
- Third-Party software upgraded to the latest of Java 8, Apache Tomcat 9, PostgreSQL 10 and more for security and performance improvements.
- Single Sign On (SSO) integration architecture has been redesigned for easy external authentication with redirect using custom scripts in any language such as Python (Note that MIMM v9.1 external authentication required custom complex integrated java scripts). This includes new support for SSO Authentication with OAuth 2.0. Post GA cumulative patches will include support for the Security Assertion Markup Language (SAML) standard, and native cloud authentications such as Amazon AWS and Microsoft Azure.
o GENERAL:
- All Java code is now compiled with OpenJDK.
- All Third-Party & Open Source Software has been upgraded to the latest versions for better security vulnerability protection.
o UPGRADE REQUIREMENTS:
-
CPU & MEMORY:
The MIMM v10.0 now uses Lucene as the search engine
instead of using the text search capabilities. Consequently, the overall
hardware requirements now demand as much processing resources (CPU and memory)
on the Application Server machine as on the underlying Database Server machine.
(MIMM v9.1 use to demand more resources on the database server as all searches
were preformed using the database server).
-
DATABASE:
The MIMM v10.0 server now requires PostgreSQL database
software version 10 or newer.
o UPGRADE PROCESS:
-
POSTGRESQL DATABASE VERSION UPGRADE:
If you deployed your MIMM v9.1
server connected to your own PostgreSQL server, then you must first upgrade the
PostgreSQL server to version 10 or newer, and make sure the MIMM v9.1 software
works still properly after that step.
-
MIMM v9.1 SOFTWARE UPGRADE & DATABASE MAINTENANCE:
You must
download and apply the latest MIMM v9.1 Cumulative patch (July 2018 or newer),
and make sure the MIMM v9.1 software still works properly after that
step.
You must then make sure that the regularly scheduled database scheduled
has ran successfully and long enough to have no further actives to do (such as
purging deleted content from the database)
-
MIMM v9.1 DATABASE BACKUP:
You must then perform a full database
backup (MIMM, data, indexes and scripts), restore it on a brand new (different)
database server, and make sure the MIMM v9.1 server software still works
properly after that step.
(Do not rely on the MIMM backup script for that
process, you must use real full database backup).
-
MIMM v10.0 SOFTWARE INSTALL & DATABASE UPGRADE:
Install the MIMM
v10.0 software on a new directory (independently of the MIMM v9.1 install
directory). Use the Setup utility to point to the database to upgrade. Starting
the new MIMM v10.0 server will automatically perform an upgrade of the
database's MIMM data and scripts. Note that MIMM v10.0 uses Lucene on the MIMM
server for search, therefore the database will no longer need to use text search
indexes and will therefore need less space in the database
o POST UPGRADE:
-
FLAT FILES:
MIMM v9.1 had no proper support for flat files which are
now fully supported as part of the new data cataloging capabilities of MIMM
v10.0. Therefore, the early Flat File (CSV or XLSX) prototypes beta bridges have
been discontinued and replaced by the import bridge from File System (CSV,
Excel, XML, JSON, Avro, Parquet, ORC), or the other new file system / object
store import bridges such Amazon S3, Azure Blob Storage, Hadoop HDFS, etc. which
can all contain flat files. All these file system import bridges now create
multi-model content which can be stitched to data mappings and other DI/ETL
models. Any content imported from Flat File (CSV or XLSX) prototypes beta
bridges can still be visible in MIMM v10.0, but no further imports can be
performed, and migration to the new bridges cannot be automatized (different
parameters and different content type from single to multi model). Therefore,
new models should be created with the new file system bridges, and the old
content (from the prototype beta bridges) should be deleted.
-
PHYSICAL DATA MODELS (PDM):
In MIMM v9.1, PDM were used to document
(including data model diagrams) existing databases, as well as designing new
databases (new tables, columns, etc.). In MIMM v10.0, PDM is no longer needed to
document existing databases (including data model diagrams, glossary
integration, etc.). Therefore, PDM is now only needed for data store
requirements / database design which requires MIMM with "metadata authoring"
license. Consequently, MIMM servers without "metadata authoring" license" can no
longer create new PDM models, although any existing PDM created with MIMM v9.1
are still operational on MIMM v10.0. The bottom line is that the MIMM v10.0 new
way to document data stores / databases is not only as powerful as PDM but it is
much more efficient with version management (changes after new harvesting).
Consequently, we recommend all PDM models used to document databases to be
converted to models which can be performed without any loss of documentation, by
means of a conversion script available on PDM models.