What it does

Key features

Open standard specification — backend-agnostic lineage event format
Integrations: Apache Spark, Airflow, dbt, Flink, Great Expectations, Dagster
Automatic lineage capture without code changes for supported APAC frameworks
Job-level and run-level lineage with schema and data quality metadata
Compatible backends: Marquez, DataHub, OpenMetadata, Astronomer
Facet extension system for custom APAC metadata fields

When to reach for it

Best for

APAC data engineering teams running pipelines on Spark, Airflow, or dbt who need automatic lineage capture without per-tool custom development, feeding a centralized APAC data catalog.

Don't get burned

Limitations to know

! Standard only — requires a compatible backend (Marquez, DataHub) to store and visualize
! Integration coverage varies by framework version; some APAC tools need custom emitters
! Column-level lineage support varies across integrations (job-level is more complete)

Context

About OpenLineage

OpenLineage is an open standard (hosted by the Linux Foundation AI & Data) that defines a common specification for data lineage metadata. Rather than building custom lineage extraction for each pipeline tool, APAC data engineering teams instrument their Spark jobs, Airflow DAGs, dbt models, and Flink pipelines with OpenLineage-compatible libraries — these automatically emit structured lineage events describing what data inputs were consumed, what outputs were produced, and what transformations were applied.

The OpenLineage specification covers job-level lineage (which datasets does this job read/write?) and run-level lineage (for this specific pipeline execution, what were the input/output row counts, schemas, and data quality metrics?). APAC teams use OpenLineage to feed lineage metadata into compatible backends — Marquez for open-source lineage storage, DataHub or OpenMetadata for enterprise APAC data catalogs, or Astronomer Lineage for managed Airflow.

For APAC data engineering teams the primary value is automated, code-change-free lineage capture: when an Airflow task or Spark job executes, OpenLineage automatically records which APAC tables were read, which were written, and the schema of each — without APAC engineers manually maintaining a separate lineage registry. This lineage feeds APAC impact analysis (if this upstream APAC table changes, which downstream APAC pipelines and reports are affected?) and APAC root cause analysis during data quality incidents.

OpenLineage

Key features

Best for

Limitations to know

About OpenLineage

Where this category meets practice depth.