Go back to the main page

SAP CDS View Extraction Pipeline

8-Phase Metadata & SQL Translation Pipeline
SAP S/4HANA 2023

Overview

SAP CDS View Extraction Pipeline is an 8-phase Java pipeline that runs on sapidess4 via JCo RFC. It extracts CDS view dependencies, metadata, and SQL definitions from SAP S/4HANA, producing CSV files that flow through a Fivetran SDK connector into GCS Parquet and ultimately Google Cloud Storage Buckets.

End-to-end flow: Java phases produce CSV files → connector.py reads them → Fivetran SDK pushes to GCS Parquet files in Google Cloud Storage Buckets.

SAP System

System
S/4HANA 2023 IDES (SID: S4H)
Client
100
Host
sapidess4
Language
EN

Pipeline Infrastructure

Production Dir
/usr/sap/cds_sql_only/
GCS Bucket
gs://sap_cds_dbt/sap_cds_views/
GCS Bucket
gs://sap_cds_dbt/sap_cds_views/
GitHub
fivetran/CDS-metadata-retrieval-fr-custom-SDK

Technology Stack

Runtime
Java (JCo RFC) on sapidess4
Connector
Python connector.py (Fivetran SDK)
Storage
GCS Parquet
Target
Google Cloud Storage Buckets (Parquet)

Custom ABAP RFCs

Z_CDS_SQL_VIEWS
BFS chain resolution engine
Z_VIEW_DDL
HANA CREATE VIEW SQL
Z_CDS_DEPENDENCIES
Dependency tree walker
Z_VIEW_GET_META
DD02L/DD02T/DD26S/DD03L metadata

Pipeline Execution Flow

Important: Phases do NOT run in numeric order. The actual execution sequence is shown below.
Input configuration.json (root views)
Phase 1 Dependency Discovery
BFS over DDLDEPENDENCY • 10 min
Phase 2 Recursive Resolution
Z_CDS_DEPENDENCIES + RFC_READ_TABLE • 10 min
Phase 3 (first run) Descriptions
DD02T descriptions • 10 min
Phase 8 BFS Chain Resolution
Z_CDS_SQL_VIEWS • BASEINFO parsing • 60 min
Phase 3 (re-run) Descriptions (post-Phase 8)
Picks up new objects discovered by Phase 8
Phase 4 Field Metadata
DD03L/DD03VT via RFC_READ_TABLE • 120 min
Phase 5 HANA SQL Definitions
Z_VIEW_DDL • CREATE VIEW SQL • 15 min
Phase 6 SQL-Only Dual Definitions
Z_VIEW_DDL + Z_CDS_DEPENDENCIES • 15 min
Phase 7 Comprehensive Metadata
Z_VIEW_GET_META • DD02L/DD02T/DD26S/DD03L • 30 min
Output connector.py → Fivetran SDK → GCS Buckets (Parquet)

Directory Structure

Production directory on sapidess4:

/usr/sap/cds_sql_only/
  connector.py                          # Fivetran SDK connector (reads CSV, pushes to GCS)
  configuration.json                    # Root CDS views to extract
  drivers/
    installation.sh                     # Setup script
    configuration.json                  # JCo connection parameters
    sapjco3.jar                         # SAP JCo library
    libsapjco3.so                       # Native JCo library (Linux)
    gson.jar                            # JSON parsing
    SimpleDependencyTable.java/.class   # Phase 1
    Phase2Recursive.java/.class         # Phase 2
    Phase3Descriptions.java/.class      # Phase 3
    Phase4FieldMetadataFixed.java/.class# Phase 4
    Phase5SqlDefinitions.java/.class    # Phase 5
    Phase6SqlOnlyViews.java/.class      # Phase 6
    Phase7ViewMetadata.java/.class      # Phase 7
    Phase8ChainResolution.java/.class   # Phase 8
    DependencyResolver.java/.class      # Shared: recursive dependency resolver
    DependencyNode.java/.class          # Shared: tree node model
    DependencyGraph.java/.class         # Shared: graph data structure
    TestZCdsSqlViews.java/.class        # RFC test harness
    SQL_NAME_MAPPING.csv                # CDS entity → HANA SQL name mapping

Phase 1: Dependency Discovery

SimpleDependencyTable

Performs a BFS (Breadth-First Search) over the SAP DDLDEPENDENCY table starting from root CDS views specified in configuration.json. Discovers the full dependency tree of all objects referenced by the root views.

Details

Class
SimpleDependencyTable
RFC Used
RFC_READ_TABLE on DDLDEPENDENCY
Timeout
10 minutes

Output

CSV File
SIMPLE_DEPENDENCY_TABLE.csv
Columns
DDLNAME, OBJECTNAME, OBJECTTYPE, STATE
Key logic: Starting from each root view, recursively queries DDLDEPENDENCY to find all objects (views, tables, data elements) that the root depends on, building a flat dependency list.

Phase 2: Recursive Resolution

Phase2Recursive

Uses the custom Z_CDS_DEPENDENCIES RFC plus RFC_READ_TABLE to recursively resolve all dependencies and classify each object as a CDS view, database table, table function, or other type.

Details

Class
Phase2Recursive
RFCs Used
Z_CDS_DEPENDENCIES, RFC_READ_TABLE
Timeout
10 minutes

Output

CSV File
PHASE2_DEPENDENCIES.csv
Columns
OBJECT_NAME, OBJECT_TYPE, PARENT_VIEW, DEPTH, CLASSIFICATION
Key logic: Classifies each discovered object into categories (CDS View, DB Table, Table Function, etc.) and records the parent-child relationship with depth level for the full dependency tree.

Phase 3: Descriptions

Phase3Descriptions

Extracts human-readable descriptions for all discovered objects from the SAP DD02T (Data Dictionary text table). Runs twice in the pipeline: once after Phase 2, and again after Phase 8 to pick up newly discovered objects.

Details

Class
Phase3Descriptions
RFC Used
RFC_READ_TABLE on DD02T
Timeout
10 minutes
Runs
Twice (before and after Phase 8)

Output

CSV File
OBJECT_DESCRIPTIONS.csv
Columns
TABNAME, DDLANGUAGE, DDTEXT
GCS Parquet File
all_descriptions

Phase 8: BFS Chain Resolution

Phase8ChainResolution

The core engine of the pipeline. Uses the custom Z_CDS_SQL_VIEWS RFC to perform BFS chain resolution, parsing BASEINFO structures to determine which dependencies are real data sources (FROM clause) vs. association pointers. Implements the ignore_association logic that can cut the dependency tree in half.

Details

Class
Phase8ChainResolution
RFC Used
Z_CDS_SQL_VIEWS
Timeout
60 minutes

Output

GCS Parquet Files
sql_only_resolution, sql_only_objects
Key Feature
BASEINFO parsing, ignore_association
Longest-running phase (60 min). Processes every object through Z_CDS_SQL_VIEWS, parsing the BASEINFO FROM and ASSOCIATED arrays to classify each dependency.
BASEINFO parsing: Each CDS view returns a BASEINFO structure containing a FROM array (real data dependencies) and an ASSOCIATED array (association pointers). Objects in the ASSOCIATED array get ignore_association = YES and can be safely excluded from SQL translation.

Phase 4: Field Metadata

Phase4FieldMetadataFixed

Extracts detailed field-level metadata for every object using RFC_READ_TABLE on the SAP Data Dictionary tables DD03L (field definitions) and DD03VT (field descriptions).

Details

Class
Phase4FieldMetadataFixed
RFC Used
RFC_READ_TABLE on DD03L, DD03VT
Timeout
120 minutes (longest phase)

Output

CSV Files
{OBJECT_NAME}_METADATA.csv
Columns
TABNAME, FIELDNAME, POSITION, DATATYPE, LENG, DECIMALS, DDTEXT
GCS Parquet File
metadata_all
Longest overall phase (120 min). Must query DD03L individually for each object because the table is too large for bulk reads.

Phase 5: HANA SQL Definitions

Phase5SqlDefinitions

Retrieves the actual HANA CREATE VIEW SQL statements for each CDS view using the custom Z_VIEW_DDL RFC. These are the native HANA SQL definitions that can be translated to ANSI SQL for various target platforms (Snowflake, Databricks, etc.).

Details

Class
Phase5SqlDefinitions
RFC Used
Z_VIEW_DDL
Timeout
15 minutes

Output

GCS Parquet Files
{view}_sql_definition
Content
Full HANA CREATE VIEW SQL text

Phase 6: SQL-Only Dual Definitions

Phase6SqlOnlyViews

Extends Phase 5 by retrieving both the HANA SQL and the ABAP SQL definition for "SQL-only" views — views identified by Phase 8 as needing only real data dependencies (no associations). Uses both Z_VIEW_DDL and Z_CDS_DEPENDENCIES.

Details

Class
Phase6SqlOnlyViews
RFCs Used
Z_VIEW_DDL, Z_CDS_DEPENDENCIES
Timeout
15 minutes

Output

GCS Parquet Files
{view}_abap_sql_definition, {view}_hana_sql_definition
Content
Dual SQL definitions (ABAP CDS + HANA native)

Phase 7: Comprehensive Metadata

Phase7ViewMetadata

Retrieves comprehensive Data Dictionary metadata using the custom Z_VIEW_GET_META RFC, which returns data from four SAP tables in a single call: DD02L (table header), DD02T (table texts), DD26S (view structure/joins), and DD03L (field definitions).

Details

Class
Phase7ViewMetadata
RFC Used
Z_VIEW_GET_META
Timeout
30 minutes

Output

GCS Parquet Files
dd02l_all, dd02t_all, dd26s_all, dd03l_all
Content
Full DD metadata for all views

Custom ABAP Function Modules

Four custom Z_* function modules were developed in ABAP and deployed to the SAP system to support the pipeline.

Z_CDS_SQL_VIEWS

BFS chain resolution engine. The most complex RFC — walks the CDS dependency tree and returns BASEINFO structures with FROM/ASSOCIATED arrays.

DirectionParameterTypeDescription
IMPORTINGIV_VIEWNAMECHAR(30)CDS view name to resolve
IMPORTINGIV_MAX_DEPTHINT4Maximum recursion depth
EXPORTINGET_RESOLUTIONTABLEResolved dependency chain with BASEINFO
EXPORTINGET_OBJECTSTABLEAll discovered objects with ignore_association flag

Z_VIEW_DDL

Returns the HANA native CREATE VIEW SQL definition for a given CDS view.

DirectionParameterTypeDescription
IMPORTINGIV_VIEWNAMECHAR(30)View name (ABAP Dictionary name)
EXPORTINGEV_SQLSTRINGFull HANA CREATE VIEW SQL statement
EXPORTINGEV_ABAP_SQLSTRINGABAP CDS SQL definition (if available)

Z_CDS_DEPENDENCIES

Walks the CDS dependency tree and returns parent-child relationships with object classification.

DirectionParameterTypeDescription
IMPORTINGIV_VIEWNAMECHAR(30)Root view name
IMPORTINGIV_DEPTHINT4Max traversal depth
EXPORTINGET_DEPENDENCIESTABLEDependency tree with classification

Z_VIEW_GET_META

Returns comprehensive Data Dictionary metadata (DD02L, DD02T, DD26S, DD03L) for a view in a single call.

DirectionParameterTypeDescription
IMPORTINGIV_VIEWNAMECHAR(30)View name
EXPORTINGET_DD02LTABLETable/view header records
EXPORTINGET_DD02TTABLETable/view description texts
EXPORTINGET_DD26STABLEView structure (join conditions)
EXPORTINGET_DD03LTABLEField definitions

HANA SQL → ANSI SQL Translation Rules

SAP HANA SQL extracted from CDS views requires mechanical translation before it can run on target platforms like Snowflake or Databricks. The translation falls into three categories:

~75% Mechanical: Schema removal, quoting, alias cleanup — fully automatable.
~15% Function Rules: SAP-specific functions with known target equivalents.
~10% Manual: Complex expressions, SESSION_CONTEXT, custom logic requiring human review.

Mechanical Translation Rules

HANA SQL PatternANSI / Target SQLNotes
"SAPHANADB"."TABLENAME"TABLENAMERemove schema prefix entirely
N'text''text'Remove N prefix (national character literal)
/*..CARDINALITY..*/(remove)HANA cardinality hints — strip completely
SESSION_CONTEXT('CDS_CLIENT')'100'Replace with hardcoded client value
"alias"."field"alias.fieldAdjust quoting per target dialect
IFNULL(x, y)COALESCE(x, y)Standard ANSI equivalent
TO_NVARCHAR(x)CAST(x AS STRING)Standard string cast

SAP Function Translation Table

SAP HANA FunctionGoogle Standard SQLSnowflakeDatabricks
DATS_TIMS_TO_TSTMPPARSE_TIMESTAMPTO_TIMESTAMP_NTZTO_TIMESTAMP
TSTMP_TO_DATSFORMAT_TIMESTAMPTO_VARCHARDATE_FORMAT
FLTP_TO_DECCAST(x AS NUMERIC)TO_DECIMALCAST(x AS DECIMAL)
TSTMP_CURRENT_UTCTIMESTAMPCURRENT_TIMESTAMP()CURRENT_TIMESTAMP()CURRENT_TIMESTAMP()
ABAP_SYSTEM_TIMEZONE'UTC''UTC''UTC'
TSTMP_TO_DSTFORMAT_TIMESTAMPTO_VARCHARDATE_FORMAT

Target Dialect Differences

FeatureGoogle Standard SQLSnowflakeDatabricks
Identifier QuotingBackticks `name`Double quotes "name"Backticks `name`
Schema Prefixdataset.tableschema.tablecatalog.schema.table
String TypeSTRINGVARCHARSTRING
Timestamp TypeTIMESTAMPTIMESTAMP_NTZTIMESTAMP
Integer DivisionTruncatesTruncatesReturns DOUBLE

CDS View Name Layers

Every CDS view has three distinct names. Understanding the mapping is critical for SQL translation.

LayerExampleWhere Used
CDS Entity NameI_ProductABAP CDS source code, annotations
ABAP Dictionary NameIPRODUCTSAP Data Dictionary (SE11), DD03L, RFC_READ_TABLE
HANA SQL Name"IPRODUCT"HANA CREATE VIEW statements, HANA catalog
SQL_NAME_MAPPING.csv provides the translation between all three name layers. This file is generated during the pipeline and is essential for correctly replacing view references during SQL translation.

ignore_association Logic

The ignore_association flag is the key optimization in Phase 8. It classifies each dependency as either a real data source or an association pointer.

ignore_association = NO

Object appears in the BASEINFO FROM array. This is a real data dependency — the view's SQL SELECT actually reads data from this object. Must be included in any SQL translation.

ignore_association = YES

Object appears in the BASEINFO ASSOCIATED array. This is an association pointer — it defines a navigation path but the view does not read data from it directly. Can be safely excluded.

Impact: Filtering out ignore_association = YES objects can cut the dependency tree roughly in half, dramatically reducing the number of views that need SQL translation.
-- Example: BASEINFO structure returned by Z_CDS_SQL_VIEWS
BASEINFO: {
  "FROM": [
    "MARA",           -- real table dependency (ignore_association = NO)
    "IPRODUCT"        -- real view dependency (ignore_association = NO)
  ],
  "ASSOCIATED": [
    "I_PRODUCTTEXT",  -- association pointer (ignore_association = YES)
    "I_PLANT"         -- association pointer (ignore_association = YES)
  ]
}

GCS Parquet Files Reference

All Parquet files are stored in the gs://sap_cds_dbt/sap_cds_views/ GCS Bucket, pushed via the Fivetran SDK connector.

Parquet FileSource PhaseKey ColumnsDescription
all_descriptionsPhase 3TABNAME, DDLANGUAGEDD02T descriptions for all objects
metadata_allPhase 4TABNAME, FIELDNAMEField-level metadata (DD03L + DD03VT)
{view}_sql_definitionPhase 5VIEW_NAMEHANA CREATE VIEW SQL per view
{view}_abap_sql_definitionPhase 6VIEW_NAMEABAP CDS SQL definition per view
{view}_hana_sql_definitionPhase 6VIEW_NAMEHANA native SQL definition per view
sql_only_resolutionPhase 8VIEW_NAME, DEPTHFull BFS chain resolution results
sql_only_objectsPhase 8OBJECT_NAMEAll objects with ignore_association flag
dd02l_allPhase 7TABNAMETable/view header (DD02L)
dd02t_allPhase 7TABNAME, DDLANGUAGETable/view texts (DD02T)
dd26s_allPhase 7VIEWNAME, TABNAMEView structure/joins (DD26S)
dd03l_allPhase 7TABNAME, FIELDNAMEField definitions (DD03L)

Phase Timeouts

Each phase has a configured timeout. Exceeding the timeout usually indicates an RFC connectivity issue or an unexpectedly large dependency tree.

PhaseClassTimeoutTypical Duration
Phase 1SimpleDependencyTable10 min2-5 min
Phase 2Phase2Recursive10 min3-7 min
Phase 3Phase3Descriptions10 min1-3 min
Phase 4Phase4FieldMetadataFixed120 min60-90 min
Phase 5Phase5SqlDefinitions15 min5-10 min
Phase 6Phase6SqlOnlyViews15 min5-10 min
Phase 7Phase7ViewMetadata30 min15-20 min
Phase 8Phase8ChainResolution60 min30-45 min
Total pipeline runtime: approximately 3-4 hours end-to-end for a typical set of root CDS views.

Troubleshooting

ProblemCauseFix
Java not found on sapidess4 PATH not set for Java runtime Run export PATH=$PATH:/usr/lib/jvm/java-17-openjdk/bin or check installation.sh
Empty GCS Bucket connector.py did not find CSV files Check CSV output in /usr/sap/cds_sql_only/drivers/; verify phases completed successfully
Duplicate rows in GCS Parquet Pipeline ran twice without clearing previous output Delete existing CSV files before re-running; use WRITE_DISPOSITION = WRITE_TRUNCATE
Compilation error on .java file Missing classpath entries (sapjco3.jar, gson.jar) Compile with: javac -cp .:sapjco3.jar:gson.jar ClassName.java
RFC_SYSTEM_FAILURE SAP work process crash or function module error Check SAP SM21 system log; restart the pipeline phase
RFC_COMMUNICATION_FAILURE Network timeout or SAP gateway down Verify SAP is running (sapcontrol -nr 03 -function GetProcessList); check JCo config
Phase timeout exceeded Too many objects or slow RFC responses Reduce root views in configuration.json; increase timeout in Java code; check SAP work process availability
Z_CDS_SQL_VIEWS returns empty View name mismatch (CDS entity vs ABAP Dictionary name) Use the ABAP Dictionary name (uppercase, no underscores); check SQL_NAME_MAPPING.csv
connector.py Fivetran SDK error SDK version mismatch or missing Python dependencies Check pip list | grep fivetran; ensure SDK is installed and connector.py uses correct API version
GCS upload fails Service account permissions or bucket path wrong Verify gs://sap_cds_dbt/sap_cds_views/ exists; check service account has Storage Object Creator role