Meta Integration® Model Bridge (MIMB)
"Metadata Integration" Solution

MIMB Bridge Documentation

MIMB Import Bridge from Apache Parquet File

Bridge Specifications

Vendor Apache
Tool Name Parquet File
Tool Version 1.0
Tool Web Site http://parquet.apache.org/
Supported Methodology [File System] Data Store (NoSQL / Hierarchical, Physical Data Model) via Java API on PARQUET File

BRIDGE INFORMATION
Import tool: Apache Parquet File 1.0 (http://parquet.apache.org/)
Import interface: [File System] Data Store (NoSQL / Hierarchical, Physical Data Model) via Java API on PARQUET File from Apache Parquet File
Import bridge: 'Parquet' 11.0.0

BRIDGE DISCLAIMER
This bridge requires internet access to https://repo.maven.apache.org/maven2/ (and exceptionally a few other tool sites)
in order to download the necessary third party software libraries into $HOME/data/download/MIMB/
(such directory can be copied from another MIMB server with internet access).
By running this bridge, you hereby acknowledge responsibility for the license terms and any potential security vulnerabilities from these downloaded third party software libraries.

BRIDGE DOCUMENTATION
This bridge imports metadata from Parquet files using a Java API.
Note that this bridge is not performing any data driven metadata discovery, but instead reading the schema definition at the footer (bottom) of the Parquet file. Therefore, this bridge needs to load the entire Parquet file to reach the schema definition at the end.

If the Parquet file is not compressed, there are no file size limit as the bridge automatically skips the data portion until the footer (although this may take time on large Parquet files). However, if the Parquet file is compressed, then the bridge needs to download the entire file to uncompress it to start with. Therefore, in such case, there is a default file size limit of 10 MB (any bigger files will be ignored), however this limit can be increased in in the Miscellaneous parameter.

This bridge detects the following standard Parquet data types:
as defined in https://parquet.apache.org/documentation/latest

BOOLEAN: 1 bit boolean
INT32: 32 bit signed ints
INT64: 64 bit signed ints
INT96: 96 bit signed ints
FLOAT: IEEE 32-bit floating point values
DOUBLE: IEEE 64-bit floating point values
BYTE_ARRAY: arbitrarily long byte arrays.

Please refer to the individual parameter's tool tips for more detailed examples.


Bridge Parameters

Parameter Name Description Type Values Default Scope
File Path to file to import FILE *.*   Mandatory
Miscellaneous Specify miscellaneous options identified with a -letter and value.

For example, -m 4G -f 100 -j -Dname=value -Xms1G

-m the maximum Java memory size whole number (e.g. -m 4G or -m 2500M ).
-v set environment variable(s) (e.g. -v var1=value -v var2="value with spaces").
-j the last option that is followed by Java command line options (e.g. -j -Dname=value -Xms1G).
-hadoop key1=val1;key2=val2 to manualy set hadoop configuration options
-tps 10 maximum threads pool size
-tl 3600s processing time limit in s -seconds m - minutes or h hours;
-fl 1000 processing files count limit;
-delimited.top_rows_skip 1 number of rows to skip while processing csv files
-delimited.extra_separators ~,||,|~ comma separated extra delimiters each of which will be used while processing csv files
-delimited.no_header by default, bridge automatically tries to detect headers while processing csv files(basing on header columns types), use this option to disable headers import(f.e. to hide sensitive data)
-fresh.partition.models - use to import latest modified files when processing partitions defined in Partitioned directories parameter
-subst K: C:/test - use to associate a root path part with a drive or another path.
-skip.download - use to disable dependencies downloading and use only download cache
-prescript [cmd] - runs a script command before bridge execution. Example: -prescript \"script.bat\"
The script must be located in the bin directory, and have .bat or .sh extension.
The script path must not include any parent directory symbol (..)
The script should return exit code 0 to indicate success, or another value to indicate failure.
-disable.partitions.autodetection - use this option to disable automatic partitions detection(when "Partition directories" option is empty)
-parquet.compressed.max.size=10000000 bridge will ignore parquet archives with size bigger then defined with this option value; default value is 10 000 000 Bytes;
STRING      

 

Bridge Mapping

Mapping information is not available

Last updated on Tue, 23 Jun 2020 18:16:25

Copyright © Meta Integration Technology, Inc. 1997-2020 All Rights Reserved.

Meta Integration® is a registered trademark of Meta Integration Technology, Inc.
All other trademarks, trade names, service marks, and logos referenced herein belong to their respective companies.