Key features
- Zero-code-change Spark lineage capture via Spark listener plugin
- Column-level lineage: tracks how specific APAC columns are derived
- Supports Spark on Databricks, EMR, HDInsight, and standalone clusters
- REST API server for lineage storage with web UI for graph exploration
- Historical lineage retention across APAC Spark job runs
- Integration with external APAC data catalogs via REST API
Best for
- APAC data engineering teams with Spark-heavy pipelines who need column-level lineage capture without modifying application code, particularly on Databricks or EMR.
Limitations to know
- ! Spark-only — does not cover Airflow, dbt, or non-Spark APAC pipeline tools
- ! Smaller community than OpenLineage ecosystem; fewer APAC integration resources
- ! Column-level lineage from complex Spark plans can be incomplete for some UDFs
About Spline
Spline (Spark Lineage) is an open-source data lineage solution from ABSA Group (Absa Bank) specifically designed for Apache Spark pipelines. APAC data engineering teams add the Spline Spark Agent as a listener to their Spark context — no application code changes required — and Spline automatically intercepts the Spark execution plan to extract column-level lineage: which source columns contributed to which output columns through which transformations.
Spline's column-level lineage granularity distinguishes it from job-level lineage tools: rather than knowing that a Spark job reads from 'raw_payments' and writes to 'stg_payments', Spline knows that 'stg_payments.apac_payment_amount_usd' was derived from 'raw_payments.amount_cents' via a division operation. This column-level visibility is particularly valuable for APAC data teams performing impact analysis: if the 'amount_cents' column changes data type, Spline can show exactly which downstream APAC columns and reports are affected.
Spline provides a REST server for lineage storage and a web-based UI for interactive lineage graph exploration. For APAC teams whose data infrastructure is primarily Spark-based (Databricks, EMR, on-premise Spark), Spline offers purpose-built Spark lineage capture that integrates deeply with Spark's query plan mechanism. For APAC teams with multi-framework pipelines (Airflow + Spark + dbt), OpenLineage with a Marquez backend provides broader coverage across all tools.
Beyond this tool
Where this category meets practice depth.
A tool only matters in context. Browse the service pillars that operationalise it, the industries where it ships, and the Asian markets where AIMenta runs adoption programs.
Other service pillars
By industry