based on the Meta Integration® Repository (MIR) for metadata storage (in a database server),
and the Meta Integration® Model Bridge (MIMB) middleware for metadata harvesting.
Metadata Management for Data Cataloging and Data Governance
Data Cataloging and Data Governance solutions must be built upon solid Metadata Management foundations: At the core of MIMM is a high scalability Metadata Version & Configuration Management system designed to support the continuous changes in the enterprise architecture with metadata harvesting (on prem & multi-cloud) and automatic stitching.
The big picture of Metadata Management
- with Business Glossary semantic traceability to the actual Enterprise Architecture.
- with data flow lineage & impact analysis, and support for DI & BI development Lifecycle Change Management./li>
From Data Cataloging (with technical models) to Data Governance (with Business Models)
Metadata Harvesting (with MIMB)
- Data Stores / Relational Database: Oracle, Microsoft SQL Server, Azure SQL Database and Data Warehouse, Amazon RedShift, Google Big query, SAP HANA, Snowflake, Teradata, IBM DB2, SAS Library, PostgreSQL, MySQL, GreenPlum, Netezza, etc.
- Data Stores / Big Data: Apache (Cloudera, Hortonworks, MapR) Hadoop Hive, Hbase, Hcatalog, etc.
- Data Stores / NoSQL: MongoDB, CouchDB, Cassandra, etc.
- Data Stores / Files: Delimited Files (CSV, XLSX), Positional Files, Cobol Copybook, Hierarchical/NoSQL Files (JSON, Avro, Parquet, XML), etc.
- Data Stores / Data Lake: Linux/Windows File Systems, Apache (Cloudera, Hortonworks, MapR) Hadoop HDFS, Amazon S3, Microsoft Azure Blob Storage and Data Lake Storage, Confluent Kafka Schema Registry, OpenStack Swift, etc.
- Data Stores / API: Open API Specifications (OAS), Web Services Description Language (WSDL)
- Data Modeling: Erwin, IDERA ER/Studio, SAP PowerDesigner, etc.
- Metadata Management: Hortonworks Atlas, Cloudera Navigator, OMG UML and CWM, W3C Semantic Web Ontology (OWL/RDF), etc.
- Data Integration Scripts: Oracle PL/SQL, Microsoft SQLServer Transact-SQL, Teradata BTEQ/FastLoad/BulkLoad/TPT, SAS code, Apache (Cloudera, Hortonworks, MapR) HiveQL and Sqoop, Apache Spark (with Python or Scala), etc.
- Data Integration Tools: Informatica PowerCenter and Developer (DEI/BDM), IBM DataStage, Oracle ODI, Microsoft SSIS, SAP BusinessObjects Data Services, SAS DI, Talend DI and Data Prep, Amazon Glue, Microsoft Azure Data Factory, Denodo Virtual DataPort, etc.
- Business Intelligence : SAP BusinessObjects, IBM Cognos, Microsoft SSAS/SSRS and Azure PowerBI, Oracle OBIEE, Microstrategy, SAS BI, Tableau, TIBCO SpotFire, QlikView/QlikSense, ThoughtSpot, etc.
- Business Applications: SAP Business Suite (ECC), SAP Business Warehouse (BW), Salesforce, etc.
Metadata Management (MM)
- Model Manager: with automatic metadata stitching, and Enterprise Architecture diagramming
- Metadata Search: metadata driven pre and post filters, semantic search language
- Metadata Browser: hierarchical metadata browsers with custom metadata profiles per tool/technology
- Metadata Tabular Analyzer Reporter: with bulk editing capabilities
- Data Model Visualizer: ER Diagrams
- Data Flow Lineage and Impact Analyzer: Data flow Lineage and impact analyzer down to feature level, with data vs control flow
- Multi-Configuration Management: multi configurations for enterprise architectures (e.g. data lake vs data warehouse, business units and groups (as multi-tenants)
- Multi-Version Management: efficient incremental metadata harvesting (on prem and multi-cloud) and automatic metadata stitching, with model history/SOX compliance
- Metadata Comparator: comparison with previous versions for the impact of change
Data Governance (DG)
- Business Glossary (BG): with customizable workflow automation)
- Semantic Mapper: search driven, auto map, and multi-levels from glossaries to data stores via design models
- Semantic Lineage Analyzer: term usage, and automatic glossary definition on data pass through
- Local Documentation: quick in place editing of business names and descriptions while browsing harvested data stores
- Glossary Linking: quick in place semantic linking while browsing harvested data stores, DI jobs, and BI reports
- Data Tagging: applying reusable Labels available in search
- Comments and Reviews: collecting business user feedback and managing reviews
Data Cataloging (DC)
- File System Crawling: file type auto-detection, partitioning auto-detection
- Data Profiling: from data sampling to full data profiling with statistical results
- Semantic Discovery: semantic types, patterns/lists machine learning
- Relationship Discovery: data driven matching semantic types and metadata driven inferred from usage in DI, BI, SQL, etc.
- Social Curation: endorsement, warnings, certifications with impact on search
- Data Store Documenter: automatic reverse engineering of naming standards with supervised machine learning
- Data Store Modeler: with full data model diagram editing
- Data Store Designer: new data store specifications and design
- Data Mapper: from business user data mapping specifications to design for bulk and feature/SQL with joins/filters/lookups
- Dataset Developer: with reusable datasets and above data mapping design capabilities
- BI Web Portal: Multi-vendor BI Web Portal with bi-directional integration, and glossary generation
Active Data Governance (Forward Engineering with MIMB)
- Data Modeling Tools: Erwin, ER/Studio, PowerDesigner, etc.
- Data Integration Scripts: PL/SQL, BTEQ, HiveQL, etc.
- Data Integration: to self service / data prep tools
- Business Intelligence: to self service like Tableau or design layers like BO Universes.
Administration, Customization, & Extensions
- Custom Attribute Extensions (MyCompanyCertificationLevel, etc.
- Customizable UI: menus, widget layout, etc.
- REST API: glossary lookups, linage trace, automatic harvesting, search, browse, update, etc.