Meta Integration® Model Bridge (MIMB)
"Metadata Integration" Solution

MIMB Bridge Documentation

MIMB Import Bridge from Amazon Web Services (AWS) Glue ETL (via Apache Spark)

Bridge Specifications

Vendor Amazon
Tool Name Amazon Glue ETL (via Apache Spark)
Tool Version Spark 2.x
Tool Web Site https://aws.amazon.com/glue/
Supported Methodology [Data Integration] Multi-Model, Data Store (Physical Data Model), (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Spark with Python or Scala File

BRIDGE INFORMATION
Import tool: Amazon Amazon Glue ETL (via Apache Spark) Spark 2.x (https://aws.amazon.com/glue/)
Import interface: [Data Integration] Multi-Model, Data Store (Physical Data Model), (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Spark with Python or Scala File from Amazon Web Services (AWS) Glue ETL (via Apache Spark)
Import bridge: 'ApacheSparkImport.AmazonGlueETL' 10.1.0

BRIDGE DOCUMENTATION
In Amazon AWS Glue Console, go to ETL / Jobs area where you can find all the ETL scripts.
Each of them are marked with a Type (e.g. Spark), ETL Language (e.g. Python),
and a Script Location showing where they are stored (by default on S3).
This bridge can import these scripts downloaded from S3 into local directories passed as parameter of this bridge.

The purpose of this Apache Spark import bridge is to detect and parse all Spark the statements from the Python or Scala scripts
in order to generate the exact scope (data models) of the involved source and target data stores,
as well as the data flow lineage and impact analysis (data integration ETL/ELT model) between them.


Bridge Parameters

Parameter Name Description Type Values Default Scope
Directory Select a directory with the textual files that contain the code to import DIRECTORY     Mandatory
Code Language Select the language ENUMERATED
Python
Scala
Python  
Directory Filter Specify a search filter for the sub directories. Use regular expressions in java format if needed (e.g. '.*_script'). Multiple conditions can be defined by using a space as a separator (e.g. 'directory1 directory2'). The condition must be escaped with double quotes if it contains any spaces inside (e.g. "my directory"). Negation can be defined with the preceeding dash character (e.g. '-bin'). STRING      
File Filter Specify a search filter for files. Use regular expressions in java format if needed (e.g. '.*\.py'). Multiple conditions can be defined by using a space as a separator (e.g. 'file1 file2'). The condition must be escaped with double quotes if it contains any spaces inside (e.g. "my file.py"). Negation can be defined with the preceeding dash character (e.g. '-\.tar\.gz'). STRING      
Miscellaneous Specify miscellaneous options identified with a -letter and value.

For example, -e UTF-16

-e: encoding. This value will be used to load text from the specified script files. By default, UTF-8 will be used. Here are some other possible values: UTF-16, UTF-16BE, US-ASCII.
-p: parameters. Full path to the yaml file that defines all the entry points for the scripts to parse as well as their input parameters. The new template will be generated automatically if the file doesn't exist. Use double quotes in order to escape the path that contains spaces.
-pppd. enables the DI/ETL post-processor processing of DI/ETL designs in order to create the design connections and connection data sets.
STRING      

 

Bridge Mapping

Mapping information is not available

Last updated on Wed, 11 Dec 2019 18:39:35

Copyright © Meta Integration Technology, Inc. 1997-2019 All Rights Reserved.

Meta Integration® is a registered trademark of Meta Integration Technology, Inc.
All other trademarks, trade names, service marks, and logos referenced herein belong to their respective companies.