What it does

Key features

Zero-code-change Spark lineage capture via Spark listener plugin
Column-level lineage: tracks how specific APAC columns are derived
Supports Spark on Databricks, EMR, HDInsight, and standalone clusters
REST API server for lineage storage with web UI for graph exploration
Historical lineage retention across APAC Spark job runs
Integration with external APAC data catalogs via REST API

When to reach for it

Best for

APAC data engineering teams with Spark-heavy pipelines who need column-level lineage capture without modifying application code, particularly on Databricks or EMR.

Don't get burned

Limitations to know

! Spark-only — does not cover Airflow, dbt, or non-Spark APAC pipeline tools
! Smaller community than OpenLineage ecosystem; fewer APAC integration resources
! Column-level lineage from complex Spark plans can be incomplete for some UDFs

Context

About Spline

Spline (Spark Lineage) is an open-source data lineage solution from ABSA Group (Absa Bank) specifically designed for Apache Spark pipelines. APAC data engineering teams add the Spline Spark Agent as a listener to their Spark context — no application code changes required — and Spline automatically intercepts the Spark execution plan to extract column-level lineage: which source columns contributed to which output columns through which transformations.

Spline's column-level lineage granularity distinguishes it from job-level lineage tools: rather than knowing that a Spark job reads from 'raw_payments' and writes to 'stg_payments', Spline knows that 'stg_payments.apac_payment_amount_usd' was derived from 'raw_payments.amount_cents' via a division operation. This column-level visibility is particularly valuable for APAC data teams performing impact analysis: if the 'amount_cents' column changes data type, Spline can show exactly which downstream APAC columns and reports are affected.

Spline provides a REST server for lineage storage and a web-based UI for interactive lineage graph exploration. For APAC teams whose data infrastructure is primarily Spark-based (Databricks, EMR, on-premise Spark), Spline offers purpose-built Spark lineage capture that integrates deeply with Spark's query plan mechanism. For APAC teams with multi-framework pipelines (Airflow + Spark + dbt), OpenLineage with a Marquez backend provides broader coverage across all tools.

Spline

Key features

Best for

Limitations to know

About Spline

Where this category meets practice depth.