What it does

Key features

OpenLineage-compatible API: receives lineage events from all OpenLineage integrations
Lineage graph visualization: interactive job and dataset dependency graphs
Dataset version tracking with schema history across APAC pipeline runs
Job run history with input/output metrics and data quality facets
REST API for programmatic APAC lineage queries and impact analysis
Docker Compose deployment for rapid APAC self-hosted setup

When to reach for it

Best for

APAC data engineering teams that need a lightweight, self-hosted lineage backend for OpenLineage events without deploying a full enterprise data catalog.

Don't get burned

Limitations to know

! Lineage-focused — lacks data catalog features like business glossary, stewardship, access control
! UI is functional but less polished than commercial APAC alternatives
! Scales to mid-size APAC deployments; very large pipeline volumes may require tuning

Context

About Marquez

Marquez is an open-source metadata service from the Linux Foundation AI & Data project that serves as the reference implementation backend for OpenLineage. APAC data engineering teams deploy Marquez to receive OpenLineage events from their Spark, Airflow, and dbt pipelines, storing job run history, dataset schemas, and data lineage graphs in a queryable API with a web-based visualization interface.

Marquez's data model organizes lineage around namespaces, jobs, and datasets — reflecting how APAC data pipelines actually work: a Spark job in the 'apac-payments' namespace reads from 'raw_transactions' and writes to 'stg_payments', and Marquez stores this relationship with the schema of each dataset and the metadata of each job run. APAC data teams use Marquez to investigate data quality incidents by tracing backwards through the lineage graph: if the 'fct_apac_revenue' table has incorrect data, which upstream APAC jobs and sources could be responsible?

For APAC teams already using DataHub or OpenMetadata as their primary data catalog, Marquez serves as a lightweight lineage-specific backend focused on the OpenLineage API. For APAC teams starting their data governance journey without an existing catalog, Marquez provides an immediately operational lineage store with minimal infrastructure requirements (Docker Compose deployment, PostgreSQL backend).

Marquez

Key features

Best for

Limitations to know

About Marquez

Where this category meets practice depth.