Meta Integration® Model Bridge (MIMB)
"Metadata Integration" Solution

MIMB Bridge Documentation

MIMB Import Bridge from Delimited File (CSV)

Bridge Specifications

Vendor ISO
Tool Name Delimited File (CSV)
Tool Version N/A
Tool Web Site https://en.wikipedia.org/wiki/Delimiter-separated_values
Supported Methodology [File System] Data Store (Physical Data Model) via CSV, TXT File

SPECIFICATIONS
Tool: ISO / Delimited File (CSV) version N/A via CSV, TXT File
See https://en.wikipedia.org/wiki/Delimiter-separated_values
Metadata: [File System] Data Store (Physical Data Model)
Component: FlatFile version 11.0.0

OVERVIEW
This bridge detects (reverse engineer) the metadata from a data file of type Delimited File (also known as Flat File).
The detection of such Delimited File is not based on file extensions (such as .CSV, .PSV) but rather by sampling the file content.

The bridge can detect a header row, and use it to create the field name, otherwise generic filed names are created.

The bridge samples up to 100 rows in order to automatically detect the field separators which by default include:
', (comma)' , '; (semicolon)', ': (colon)', '\t (tab)', '| (pipe)', '0x1 (ctrl+A)', 'BS (\u0008)'
More separators can be added in the auto detection process (including double characters), see the Miscellaneous parameter.

During the sampling, the bridge also detects the file data types, such as DATE, NUMBER, STRING.
common_fs_delimited_description: |-2
This bridge detects (reverse engineer) the metadata from a data file of type Delimited File (also known as Flat File).
The detection of such Delimited File is not based on file extensions (such as .CSV, .PSV) but rather by sampling the file content.

The bridge can detect a header row, and use it to create the field name, otherwise generic filed names are created.

The bridge samples up to 100 rows in order to automatically detect the field separators which by default include:
', (comma)' , '; (semicolon)', ': (colon)', '\t (tab)', '| (pipe)', '0x1 (ctrl+A)', 'BS (\u0008)'
More separators can be added in the auto detection process (including double characters), see the Miscellaneous parameter.

During the sampling, the bridge also detects the file data types, such as DATE, NUMBER, STRING.


Bridge Parameters

Parameter Name Description Type Values Default Scope
File Path to file to import FILE *.*   Mandatory
Encoding The character set encoding files use.
FYI: The default on Windows is 'Western European (Windows-1252)' and 'Western European (ISO-8859-1)' on other platforms.
When empty the local of the machine reading the file is used.
ENUMERATED
Central and Eastern European (iso-8859-2)
Central and Eastern European (windows-1250)
Chinese Traditional (big5)
Chinese Simplified (GB18030)
Chinese Simplified (GB2312)
Cyrillic (iso-8859-5)
Cyrillic (windows-1251)
DOS (ibm-850)
Greek (iso-8859-7)
Greek (windows-1253)
Hebrew (iso-8859-8)
Hebrew (windows-1255)
Japanese (shift_jis)
Korean (ks_c_5601-1987)
Thai (TIS620)
Thai (windows-874)
Turkish (iso-8859-9)
Turkish (windows-1254)
UTF 8 (utf-8)
UTF 16 (utf-16)
Western European (iso-8859-1)
Western European (iso-8859-15)
Western European (windows-1252)
Locale encoding
No encoding conversion
utf-8  
Top rows to skip Number of rows to skip from top STRING      
Delimiter By default, the delimiter is determined automatically. Use this parameter for special cases when it's needed. STRING      
Miscellaneous Specify miscellaneous options starting with a dash and optionally followed by parameters, e.g.
-connection.cast MyDatabase1="SQL Server"
Some options can be used multiple times if applicable, e.g.
-connection.rename NewConnection1=OldConnection1 -connection.rename NewConnection2=OldConnection2;
As the list of options can become a long string, it is possible to load it from a file which must be located in ${MODEL_BRIDGE_HOME}\data\MIMB\parameters and have the extension .txt. In such case, all options must be defined within that file as the only value of this parameter, e.g.
ETL/Miscellaneous.txt

GENERAL OPTIONS
-m <Java Memory's maximum size>
1G by default on 64bits JRE or as set in conf/conf.properties, e.g.
-m 8G
-m 8000M

-j <Java Runtime Environment command line options>
This option must be the last one in the Miscellaneous parameter as all the text after -j is passed "as is" to the JRE, e.g.
-j -Dname=value -Xms1G
The following option must be set when a proxy is used to access internet (this is critical to access https://repo.maven.apache.org/maven2/ (and exceptionally a few other tool sites) in order to download the necessary third party software libraries.
-j -Dhttp.proxyHost=127.0.0.1 -Dhttp.proxyPort=3128 -Dhttps.proxyHost=127.0.0.1 -Dhttps.proxyPort=3128 -Dhttp.proxyUser=user -Dhttp.proxyPassword=pass -Dhttps.proxyUser=user -Dhttps.proxyPassword=pass

-jre <Java Runtime Environment full path name>
It can be an absolute path to javaw.exe on Windows or a link/script path on Linux, e.g.
-jre "c:\Program Files\Java\jre1.8.0_211\bin\javaw.exe"

-v <Environment variable value>
None by default, e.g.
-v var1=value1 -v var2="value2 with spaces"

-model.name <model name>
Override the model name, e.g.
-model.name "My Model Name"

-prescript <script name>
The script must be located in the bin directory, and have .bat or .sh extension.
The script path must not include any parent directory symbol (..).
The script should return exit code 0 to indicate success, or another value to indicate failure.
For example:
-prescript "script.bat arg1 arg2"

-cache.clear
Clears the cache before the import, and therefore will run a full import without incremental harvesting.
Warning: this is a system option managed by the application calling the bridge and should not be set by users.

FILE SYSTEM OPTIONS
-tps <Processing Thread Pool Size's maximum count>
By default 1, for e.g.
-tps 10

-tl <Processing Time Limit duration>
No limits by default. Time can be specified in seconds, minutes, or hours, e.g.
-tl 3600s
-tl 60m
-tl 1h

-fl <Processing File Limit count>
No limits by default, e.g.
-fl 100

-hadoop <Hadoop configuration options>
None by default, e.g.
-hadoop key1=val1;key2=val2

-fresh.partition.models
Use to import latest modified files when processing partitions defined in Partitioned directories parameter.

-subst <path> <new path>
Use to associate a root path part with a drive or another path, e.g.
-subst K: C:/test

-skip.download
Use to disable dependencies downloading and use only download cache

-disable.partitions.autodetection
Use this option to disable automatic partitions detection(when "Partition directories" option is empty)

DELIMITED FILE OPTIONS
-delimited.no_header
Delimited File's header by default, bridge automatically tries to detect headers while processing csv files(basing on header columns types), use this option to disable headers import(f.e. to hide sensitive data)

-delimited.top_rows_skip <number>
Delimited file's number of rows to skip while processing (0 by default), e.g.
-delimited.top_rows_skip 1

-delimited.extra_separators <comma separated separators>
Delimited file's extra delimiters (separators by default are ), e.g.
-delimited.extra_separators ~,||,|~

PARQUET FILE OPTIONS
-parquet.compressed.max.size=<value>
Ignore parquet archives with size bigger then defined with this option value (Default value is 10 000 000 bytes), e.g.
-parquet.compressed.max.size=10000000
STRING      

 

Bridge Mapping

Mapping information is not available

Last updated on Fri, 25 Sep 2020 17:37:51

Copyright © Meta Integration Technology, Inc. 1997-2020 All Rights Reserved.

Meta Integration® is a registered trademark of Meta Integration Technology, Inc.
All other trademarks, trade names, service marks, and logos referenced herein belong to their respective companies.