|Tool Name||Amazon Glue ETL (via Apache Spark)|
|Tool Version||Spark 2.x|
|Tool Web Site||https://aws.amazon.com/glue/|
|Supported Methodology||[Data Integration] Multi-Model, Data Store (Physical Data Model), (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Spark with Python or Scala File|
Import tool: Amazon Amazon Glue ETL (via Apache Spark) Spark 2.x (https://aws.amazon.com/glue/)
Import interface: [Data Integration] Multi-Model, Data Store (Physical Data Model), (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Spark with Python or Scala File from Amazon Web Services (AWS) Glue ETL (via Apache Spark)
Import bridge: 'ApacheSparkImport.AmazonGlueETL' 11.0.0
In Amazon AWS Glue Console, go to ETL / Jobs area where you can find all the ETL scripts.
Each of them are marked with a Type (e.g. Spark), ETL Language (e.g. Python),
and a Script Location showing where they are stored (by default on S3).
This bridge can import these scripts downloaded from S3 into local directories passed as parameter of this bridge.
The purpose of this Apache Spark import bridge is to detect and parse all Spark the statements from the Python or Scala scripts
in order to generate the exact scope (data models) of the involved source and target data stores,
as well as the data flow lineage and impact analysis (data integration ETL/ELT model) between them.
|Directory||Select a directory with the textual files that contain the code to import||DIRECTORY||Mandatory|
|Code Language||Select the language||ENUMERATED||
|Directory Filter||Specify a search filter for the sub directories. Use regular expressions in java format if needed (e.g. '.*_script'). Multiple conditions can be defined by using a space as a separator (e.g. 'directory1 directory2'). The condition must be escaped with double quotes if it contains any spaces inside (e.g. "my directory"). Negation can be defined with the preceeding dash character (e.g. '-bin').||STRING|
|File Filter||Specify a search filter for files. Use regular expressions in java format if needed (e.g. '.*\.py'). Multiple conditions can be defined by using a space as a separator (e.g. 'file1 file2'). The condition must be escaped with double quotes if it contains any spaces inside (e.g. "my file.py"). Negation can be defined with the preceeding dash character (e.g. '-\.tar\.gz').||STRING|
|Miscellaneous||Specify miscellaneous options identified with a -letter and value.
For example, -e UTF-16
-e: encoding. This value will be used to load text from the specified script files. By default, UTF-8 will be used. Here are some other possible values: UTF-16, UTF-16BE, US-ASCII.
-p: parameters. Full path to the yaml file that defines all the entry points for the scripts to parse as well as their input parameters. The new template will be generated automatically if the file doesn't exist. Use double quotes in order to escape the path that contains spaces.
-pppd. enables the DI/ETL post-processor processing of DI/ETL designs in order to create the design connections and connection data sets.
-prescript [cmd] - runs a script command before bridge execution. Example: -prescript "script.bat"
The script must be located in the bin directory, and have .bat or .sh extension.
The script path must not include any parent directory symbol (..)
The script should return exit code 0 to indicate success, or another value to indicate failure.
Mapping information is not available