Meta Integration® Model Bridge (MIMB)
"Metadata Integration" Solution

MIMB Bridge Documentation

MIMB Import Bridge from W3C XML

Bridge Specifications

Vendor World Wide Web Consortium
Tool Name XML
Tool Version 1.0
Tool Web Site http://www.w3.org/TR/2000/REC-xml-20001006
Supported Methodology [File System] Data Store (NoSQL / Hierarchical, Physical Data Model) via XML File

SPECIFICATIONS
Tool: World Wide Web Consortium / XML version 1.0 via XML File
See http://www.w3.org/TR/2000/REC-xml-20001006
Metadata: [File System] Data Store (NoSQL / Hierarchical, Physical Data Model)
Component: W3cXml version 11.0.0

OVERVIEW
This W3C XML import bridge is used in conjunction with other file import bridges (e.g. CSV, XLSX, Json, Avro, Parquet) by all data lake / file crawler import bridges (e.g. File systems, Amazon S3, Hadoop HDFS).

The purpose of this XML import is to reverse engineer a model/schema from its content, when such XML was not formally defined by an XML Schema (XSD or DTD).
Such XML files are common from IoT devices uploaded into a data lake.

Nevertheless, such XML files are expected to be fully W3C compliant, especially with respect to the XML text declaration, well-formed parsed entities, and character encoding of entities.
See W3C standards for more details:
https://www.w3.org/TR/xml/#sec-TextDecl

Warning, you must use the dedicated XML based import bridges for all other needs such as:
- other standard W3C XML import bridges (e.g. DTD, XSD, WSDL, OWL/RDL)
- tool specific XML import bridges (e.g. Erwin Data Modeler XML, Informatica PowerCenter XML)


Bridge Parameters

Parameter Name Description Type Values Default Scope
File The bridge uses the XML file as input. FILE *.xml   Mandatory
Miscellaneous Specify miscellaneous options starting with a dash and optionally followed by parameters, e.g.
-connection.cast MyDatabase1="SQL Server"
Some options can be used multiple times if applicable, e.g.
-connection.rename NewConnection1=OldConnection1 -connection.rename NewConnection2=OldConnection2;
As the list of options can become a long string, it is possible to load it from a file which must be located in ${MODEL_BRIDGE_HOME}\data\MIMB\parameters and have the extension .txt. In such case, all options must be defined within that file as the only value of this parameter, e.g.
ETL/Miscellaneous.txt

GENERAL OPTIONS
-m <Java Memory's maximum size>
1G by default on 64bits JRE or as set in conf/conf.properties, e.g.
-m 8G
-m 8000M

-j <Java Runtime Environment command line options>
This option must be the last one in the Miscellaneous parameter as all the text after -j is passed "as is" to the JRE, e.g.
-j -Dname=value -Xms1G
The following option must be set when a proxy is used to access internet (this is critical to access https://repo.maven.apache.org/maven2/ (and exceptionally a few other tool sites) in order to download the necessary third party software libraries.
-j -Dhttp.proxyHost=127.0.0.1 -Dhttp.proxyPort=3128 -Dhttps.proxyHost=127.0.0.1 -Dhttps.proxyPort=3128 -Dhttp.proxyUser=user -Dhttp.proxyPassword=pass -Dhttps.proxyUser=user -Dhttps.proxyPassword=pass

-jre <Java Runtime Environment full path name>
It can be an absolute path to javaw.exe on Windows or a link/script path on Linux, e.g.
-jre "c:\Program Files\Java\jre1.8.0_211\bin\javaw.exe"

-v <Environment variable value>
None by default, e.g.
-v var1=value1 -v var2="value2 with spaces"

-model.name <model name>
Override the model name, e.g.
-model.name "My Model Name"

-prescript <script name>
The script must be located in the bin directory, and have .bat or .sh extension.
The script path must not include any parent directory symbol (..).
The script should return exit code 0 to indicate success, or another value to indicate failure.
For example:
-prescript "script.bat arg1 arg2"

-cache.clear
Clears the cache before the import, and therefore will run a full import without incremental harvesting.
Warning: this is a system option managed by the application calling the bridge and should not be set by users.

FILE SYSTEM OPTIONS
-tps <Processing Thread Pool Size's maximum count>
By default 1, for e.g.
-tps 10

-tl <Processing Time Limit duration>
No limits by default. Time can be specified in seconds, minutes, or hours, e.g.
-tl 3600s
-tl 60m
-tl 1h

-fl <Processing File Limit count>
No limits by default, e.g.
-fl 100

-hadoop <Hadoop configuration options>
None by default, e.g.
-hadoop key1=val1;key2=val2

-fresh.partition.models
Use to import latest modified files when processing partitions defined in Partitioned directories parameter.

-subst <path> <new path>
Use to associate a root path part with a drive or another path, e.g.
-subst K: C:/test

-skip.download
Use to disable dependencies downloading and use only download cache

-disable.partitions.autodetection
Use this option to disable automatic partitions detection(when "Partition directories" option is empty)

DELIMITED FILE OPTIONS
-delimited.no_header
Delimited File's header by default, bridge automatically tries to detect headers while processing csv files(basing on header columns types), use this option to disable headers import(f.e. to hide sensitive data)

-delimited.top_rows_skip <number>
Delimited file's number of rows to skip while processing (0 by default), e.g.
-delimited.top_rows_skip 1

-delimited.extra_separators <comma separated separators>
Delimited file's extra delimiters (separators by default are ), e.g.
-delimited.extra_separators ~,||,|~

PARQUET FILE OPTIONS
-parquet.compressed.max.size=<value>
Ignore parquet archives with size bigger then defined with this option value (Default value is 10 000 000 bytes), e.g.
-parquet.compressed.max.size=10000000
STRING      

 

Bridge Mapping

Mapping information is not available

Last updated on Fri, 25 Sep 2020 17:37:51

Copyright © Meta Integration Technology, Inc. 1997-2020 All Rights Reserved.

Meta Integration® is a registered trademark of Meta Integration Technology, Inc.
All other trademarks, trade names, service marks, and logos referenced herein belong to their respective companies.