Bridge Specifications
Vendor | Apache |
Tool Name | Spark (with Python or Scala) |
Tool Version | 2.x |
Tool Web Site | http://spark.apache.org/ |
Supported Methodology | [Data Integration] Multi-Model, Data Store (Physical Data Model), (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Spark with Python or Scala File |
BRIDGE INFORMATION
Import tool: Apache Spark (with Python or Scala) 2.x (http://spark.apache.org/)
Import interface: [Data Integration] Multi-Model, Data Store (Physical Data Model), (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Spark with Python or Scala File from Apache Spark (with Python or Scala)
Import bridge: 'ApacheSpark' 10.1.0
BRIDGE DOCUMENTATION
The purpose of this Apache Spark import bridge is to detect and parse all Spark the statements from the Python or Scala scripts
in order to generate the exact scope (data models) of the involved source and target data stores,
as well as the data flow lineage and impact analysis (data integration ETL/ELT model) between them.
Bridge Parameters
Parameter Name | Description | Type | Values | Default | Scope | ||
Directory | Select a directory with the textual files that contain the code to import | DIRECTORY | Mandatory | ||||
Code Language | Select the language | ENUMERATED |
|
Python | |||
Directory Filter | Specify a search filter for the sub directories. Use regular expressions in java format if needed (e.g. '.*_script'). Multiple conditions can be defined by using a space as a separator (e.g. 'directory1 directory2'). The condition must be escaped with double quotes if it contains any spaces inside (e.g. "my directory"). Negation can be defined with the preceeding dash character (e.g. '-bin'). | STRING | |||||
File Filter | Specify a search filter for files. Use regular expressions in java format if needed (e.g. '.*\.py'). Multiple conditions can be defined by using a space as a separator (e.g. 'file1 file2'). The condition must be escaped with double quotes if it contains any spaces inside (e.g. "my file.py"). Negation can be defined with the preceeding dash character (e.g. '-\.tar\.gz'). | STRING | |||||
Miscellaneous | Specify miscellaneous options identified with a -letter and value. For example, -e UTF-16 -e: encoding. This value will be used to load text from the specified script files. By default, UTF-8 will be used. Here are some other possible values: UTF-16, UTF-16BE, US-ASCII. -p: parameters. Full path to the yaml file that defines all the entry points for the scripts to parse as well as their input parameters. The new template will be generated automatically if the file doesn't exist. Use double quotes in order to escape the path that contains spaces. -pppd. enables the DI/ETL post-processor processing of DI/ETL designs in order to create the design connections and connection data sets. |
STRING |
Bridge Mapping
Mapping information is not available