Skip to main content
Vietnam
AIMenta
M

Marquez

by Linux Foundation AI & Data

Open-source metadata service and lineage backend that collects OpenLineage events, stores job and dataset metadata, and provides lineage graph visualization.

AIMenta verdict
Decent fit
4/5

"Open-source metadata service for data lineage — APAC data teams use Marquez as the OpenLineage-compatible backend to store, query, and visualize APAC job lineage graphs, track APAC dataset versions, and investigate upstream root causes of APAC data quality incidents."

Features
6
Use cases
1
Watch outs
3
What it does

Key features

  • OpenLineage-compatible API: receives lineage events from all OpenLineage integrations
  • Lineage graph visualization: interactive job and dataset dependency graphs
  • Dataset version tracking with schema history across APAC pipeline runs
  • Job run history with input/output metrics and data quality facets
  • REST API for programmatic APAC lineage queries and impact analysis
  • Docker Compose deployment for rapid APAC self-hosted setup
When to reach for it

Best for

  • APAC data engineering teams that need a lightweight, self-hosted lineage backend for OpenLineage events without deploying a full enterprise data catalog.
Don't get burned

Limitations to know

  • ! Lineage-focused — lacks data catalog features like business glossary, stewardship, access control
  • ! UI is functional but less polished than commercial APAC alternatives
  • ! Scales to mid-size APAC deployments; very large pipeline volumes may require tuning
Context

About Marquez

Marquez is an open-source metadata service from the Linux Foundation AI & Data project that serves as the reference implementation backend for OpenLineage. APAC data engineering teams deploy Marquez to receive OpenLineage events from their Spark, Airflow, and dbt pipelines, storing job run history, dataset schemas, and data lineage graphs in a queryable API with a web-based visualization interface.

Marquez's data model organizes lineage around namespaces, jobs, and datasets — reflecting how APAC data pipelines actually work: a Spark job in the 'apac-payments' namespace reads from 'raw_transactions' and writes to 'stg_payments', and Marquez stores this relationship with the schema of each dataset and the metadata of each job run. APAC data teams use Marquez to investigate data quality incidents by tracing backwards through the lineage graph: if the 'fct_apac_revenue' table has incorrect data, which upstream APAC jobs and sources could be responsible?

For APAC teams already using DataHub or OpenMetadata as their primary data catalog, Marquez serves as a lightweight lineage-specific backend focused on the OpenLineage API. For APAC teams starting their data governance journey without an existing catalog, Marquez provides an immediately operational lineage store with minimal infrastructure requirements (Docker Compose deployment, PostgreSQL backend).

Beyond this tool

Where this category meets practice depth.

A tool only matters in context. Browse the service pillars that operationalise it, the industries where it ships, and the Asian markets where AIMenta runs adoption programs.