Skip to main content
Singapore
AIMenta
S

Spline

by ABSA Group

Open-source Spark lineage tracking that automatically captures column-level data lineage from Apache Spark jobs without code changes, with a graph visualization UI.

AIMenta verdict
Niche use
3/5

"Apache Spark data lineage tracking — APAC data engineering teams use Spline to automatically capture column-level lineage from APAC Spark jobs without code changes, visualizing APAC data flow from raw sources through transformations to APAC downstream analytics tables."

Features
6
Use cases
1
Watch outs
3
What it does

Key features

  • Zero-code-change Spark lineage capture via Spark listener plugin
  • Column-level lineage: tracks how specific APAC columns are derived
  • Supports Spark on Databricks, EMR, HDInsight, and standalone clusters
  • REST API server for lineage storage with web UI for graph exploration
  • Historical lineage retention across APAC Spark job runs
  • Integration with external APAC data catalogs via REST API
When to reach for it

Best for

  • APAC data engineering teams with Spark-heavy pipelines who need column-level lineage capture without modifying application code, particularly on Databricks or EMR.
Don't get burned

Limitations to know

  • ! Spark-only — does not cover Airflow, dbt, or non-Spark APAC pipeline tools
  • ! Smaller community than OpenLineage ecosystem; fewer APAC integration resources
  • ! Column-level lineage from complex Spark plans can be incomplete for some UDFs
Context

About Spline

Spline (Spark Lineage) is an open-source data lineage solution from ABSA Group (Absa Bank) specifically designed for Apache Spark pipelines. APAC data engineering teams add the Spline Spark Agent as a listener to their Spark context — no application code changes required — and Spline automatically intercepts the Spark execution plan to extract column-level lineage: which source columns contributed to which output columns through which transformations.

Spline's column-level lineage granularity distinguishes it from job-level lineage tools: rather than knowing that a Spark job reads from 'raw_payments' and writes to 'stg_payments', Spline knows that 'stg_payments.apac_payment_amount_usd' was derived from 'raw_payments.amount_cents' via a division operation. This column-level visibility is particularly valuable for APAC data teams performing impact analysis: if the 'amount_cents' column changes data type, Spline can show exactly which downstream APAC columns and reports are affected.

Spline provides a REST server for lineage storage and a web-based UI for interactive lineage graph exploration. For APAC teams whose data infrastructure is primarily Spark-based (Databricks, EMR, on-premise Spark), Spline offers purpose-built Spark lineage capture that integrates deeply with Spark's query plan mechanism. For APAC teams with multi-framework pipelines (Airflow + Spark + dbt), OpenLineage with a Marquez backend provides broader coverage across all tools.

Beyond this tool

Where this category meets practice depth.

A tool only matters in context. Browse the service pillars that operationalise it, the industries where it ships, and the Asian markets where AIMenta runs adoption programs.