APAC ML Experiment Tracking and Data Versioning Guide 2026: DagsHub, Aim, and DVC

APAC ML Reproducibility: Versioning Data, Code, and Experiments Together

APAC data science teams face a reproducibility crisis: ML models trained on version 3 of a dataset using code commit A7F produce different results than the same code on version 4 — and most teams have no systematic way to track which combination of data, code, and hyperparameters produced which model. This guide covers three open-source-first tools APAC teams use to establish ML reproducibility through data versioning, experiment tracking, and integrated project collaboration.

DagsHub — ML project collaboration platform combining Git hosting, DVC data versioning, experiment tracking, and model registry in a GitHub-like interface for APAC data science teams.

Aim — open-source self-hosted ML experiment tracker for APAC research teams who need full data sovereignty, rich run comparison, and hyperparameter visualization without cloud vendor dependency.

DVC — Git-compatible data version control for APAC ML teams, versioning large datasets and model artifacts using familiar Git workflows stored in APAC cloud storage.

APAC ML Reproducibility Tool Selection

APAC Team Profile                      → Tool         → Why

APAC DS team, git-native, DVC users    → DagsHub       Full ML collab: code+data+
(GitHub for ML; want one platform)     →               experiments+model registry

APAC research team, data sovereign     → Aim            Self-hosted; no cloud vendor;
(gov, defense, regulated industries)   →               rich comparison; free OSS

APAC team, data versioning priority    → DVC            Git extension for large files;
(large datasets, reproducible pipelines)→              pipeline caching; storage-agnostic

APAC team, cloud-native, rich UI       → W&B            Best-in-class UI; real-time
(comfortable with cloud, budget OK)    →               training viz; sweep automation

APAC team, enterprise MLOps platform   → MLflow         Battle-tested; model registry;
(production model registry needed)     →               Databricks integration

APAC ML Reproducibility Stack:
  Code versioning:    Git (universal)
  Data versioning:    DVC (open-source) or DagsHub (managed)
  Experiment tracking: Aim (self-hosted) or DagsHub (integrated) or MLflow/W&B
  Model registry:     DagsHub model registry or MLflow model registry
  Collaboration:      DagsHub (all-in-one) or separate Git + experiment tracking

DVC: APAC Data Version Control Foundation

DVC APAC setup and dataset versioning

# APAC: DVC setup for ML project with S3 remote storage

# Initialize DVC in existing Git repo
git init apac-credit-model
cd apac-credit-model
dvc init
git add .dvc/
git commit -m "APAC: Initialize DVC for ML project"

# APAC: Configure remote storage (APAC S3 bucket for data sovereignty)
dvc remote add -d apac-s3-remote s3://apac-ml-data-singapore/credit-model/
dvc remote modify apac-s3-remote region ap-southeast-1
git add .dvc/config
git commit -m "APAC: Configure S3 remote storage (Singapore region)"

# APAC: Track first dataset version
dvc add data/apac_credit_training_v1.parquet
# Creates: data/apac_credit_training_v1.parquet.dvc (metadata file)
# .gitignore updated: data/apac_credit_training_v1.parquet (excluded from Git)

git add data/apac_credit_training_v1.parquet.dvc data/.gitignore
git commit -m "APAC: Add training dataset v1 (47,230 records, Singapore 2024)"

# APAC: Push data to S3 remote
dvc push
# Uploads: apac_credit_training_v1.parquet → S3 (any team member can dvc pull)

DVC APAC pipeline definition

# APAC: DVC pipeline — define ML stages as reproducible DAG
# File: dvc.yaml

# APAC: Define pipeline stages
stages:
  preprocess:
    cmd: python src/preprocess.py --input data/raw/ --output data/processed/
    deps:
      - src/preprocess.py
      - data/raw/                        # APAC: tracked by DVC
    outs:
      - data/processed/features.parquet  # APAC: DVC caches this output
    params:
      - params.yaml:
          - preprocessing.outlier_threshold
          - preprocessing.encoding_method

  train:
    cmd: python src/train.py
    deps:
      - src/train.py
      - data/processed/features.parquet  # APAC: must complete after preprocess
    outs:
      - models/apac_credit_model_v1.pkl  # APAC: tracked as DVC artifact
    metrics:
      - metrics/train_results.json:
          cache: false                    # APAC: metrics tracked in Git, not DVC
    params:
      - params.yaml:
          - model.learning_rate
          - model.max_depth
          - model.n_estimators

# APAC: Run only changed pipeline stages (DVC caches unchanged outputs)
dvc repro

# APAC: Output:
# Stage 'preprocess': cached (inputs unchanged — skipping)
# Stage 'train': running (hyperparameters changed in params.yaml)
# → Only the training stage reruns; preprocessing output reused from cache
# APAC: 40-minute preprocessing step skipped → 3-minute experiment iteration

DVC APAC experiment management

# APAC: DVC experiments — run and compare hyperparameter configurations

# APAC: Run experiment with modified hyperparameters (without changing params.yaml)
dvc exp run --set-param model.learning_rate=0.01 --set-param model.max_depth=6
dvc exp run --set-param model.learning_rate=0.05 --set-param model.max_depth=4
dvc exp run --set-param model.learning_rate=0.10 --set-param model.max_depth=8

# APAC: Compare all experiments
dvc exp show

# APAC: Output table:
# Experiment       | model.lr | model.depth | val_auc | val_precision
# workspace        | 0.05     | 4           | 0.847   | 0.891
# exp-abc123       | 0.01     | 6           | 0.831   | 0.876
# exp-def456       | 0.10     | 8           | 0.852   | 0.883  ← best AUC
# main             | 0.05     | 4           | 0.847   | 0.891

# APAC: Promote best experiment to production branch
dvc exp apply exp-def456
git add .
git commit -m "APAC: Apply best hyperparams from DVC experiment (lr=0.10, depth=8, AUC=0.852)"
dvc push  # Push winning model artifact to S3

Aim: APAC Self-Hosted Experiment Tracking

Aim APAC Python SDK integration

# APAC: Aim — log training metrics with full data sovereignty (self-hosted)

from aim import Run
import torch
import torch.nn as nn

# APAC: Initialize Aim run (logs to local Aim repository, no cloud)
apac_run = Run(
    repo="/opt/apac-ml/aim-repo",  # APAC: self-hosted Aim repository path
    experiment="apac-nlp-intent-classification-v3",
)

# APAC: Log hyperparameters
apac_run["hyperparameters"] = {
    "model_type": "bert-base-multilingual",
    "learning_rate": 2e-5,
    "batch_size": 32,
    "epochs": 10,
    "apac_languages": ["en", "zh", "ja", "ko"],
    "max_sequence_length": 128,
}

# APAC: Log APAC dataset metadata
apac_run["dataset"] = {
    "name": "apac-customer-service-intents-v4",
    "train_samples": 45_230,
    "val_samples": 5_650,
    "num_classes": 18,
    "languages": ["en", "zh", "ja", "ko"],
}

# APAC: Training loop with metric logging
for apac_epoch in range(10):
    apac_train_loss = apac_train_one_epoch(apac_epoch)
    apac_val_loss, apac_val_accuracy = apac_validate(apac_epoch)

    # APAC: Log scalar metrics per step
    apac_run.track(apac_train_loss,      name="train_loss",    epoch=apac_epoch)
    apac_run.track(apac_val_loss,        name="val_loss",      epoch=apac_epoch)
    apac_run.track(apac_val_accuracy,    name="val_accuracy",  epoch=apac_epoch)

    # APAC: Log per-language validation accuracy
    for apac_lang in ["en", "zh", "ja", "ko"]:
        apac_lang_acc = apac_validate_language(apac_epoch, apac_lang)
        apac_run.track(apac_lang_acc, name=f"val_accuracy_{apac_lang}", epoch=apac_epoch)

    print(f"APAC Epoch {apac_epoch}: train_loss={apac_train_loss:.4f}, val_acc={apac_val_accuracy:.4f}")

apac_run.close()
print("APAC: Run logged to self-hosted Aim repository — no data sent externally")

Aim APAC run comparison query

# APAC: Aim — query and compare experiment runs programmatically

from aim import Repo

apac_repo = Repo("/opt/apac-ml/aim-repo")

# APAC: Query all runs from the APAC NLP experiment
apac_runs = apac_repo.query_runs(
    "run.experiment == 'apac-nlp-intent-classification-v3'"
)

# APAC: Find best runs by validation accuracy
apac_results = []
for apac_run in apac_runs.iter_runs():
    apac_metrics = apac_run.get_metric("val_accuracy")
    if apac_metrics:
        apac_best_acc = max(apac_metrics.values.tolist())
        apac_results.append({
            "run_hash": apac_run.hash,
            "learning_rate": apac_run["hyperparameters"]["learning_rate"],
            "batch_size": apac_run["hyperparameters"]["batch_size"],
            "best_val_accuracy": apac_best_acc,
        })

# APAC: Sort by validation accuracy — identify top experiments
apac_results.sort(key=lambda x: x["best_val_accuracy"], reverse=True)

print("APAC: Top 5 runs by validation accuracy:")
for i, apac_result in enumerate(apac_results[:5]):
    print(
        f"  {i+1}. Run {apac_result['run_hash'][:8]}: "
        f"lr={apac_result['learning_rate']}, "
        f"batch={apac_result['batch_size']}, "
        f"acc={apac_result['best_val_accuracy']:.4f}"
    )

DagsHub: APAC Integrated ML Project Collaboration

DagsHub APAC MLflow integration

# APAC: DagsHub — log experiments with MLflow SDK to DagsHub backend

import mlflow
import os

# APAC: Point MLflow to DagsHub tracking server
os.environ["MLFLOW_TRACKING_URI"]      = "https://dagshub.com/apac-team/credit-model.mlflow"
os.environ["MLFLOW_TRACKING_USERNAME"] = "apac-ml-user"
os.environ["MLFLOW_TRACKING_PASSWORD"] = os.environ["DAGSHUB_TOKEN"]

# APAC: Standard MLflow experiment logging — all data goes to DagsHub
with mlflow.start_run(experiment_id="apac-credit-risk-v3"):

    mlflow.log_param("model_type",     "xgboost")
    mlflow.log_param("learning_rate",  0.05)
    mlflow.log_param("max_depth",      6)
    mlflow.log_param("n_estimators",   500)
    mlflow.log_param("apac_markets",   "SG,HK,MY,ID")

    # APAC: Train model
    apac_model = apac_train_credit_model(apac_params)

    # APAC: Log metrics
    mlflow.log_metric("val_auc",         apac_val_auc)
    mlflow.log_metric("val_precision",   apac_val_precision)
    mlflow.log_metric("val_recall",      apac_val_recall)
    mlflow.log_metric("apac_sg_auc",     apac_sg_auc)    # APAC: per-market breakdown
    mlflow.log_metric("apac_hk_auc",     apac_hk_auc)
    mlflow.log_metric("apac_my_auc",     apac_my_auc)

    # APAC: Log model artifact
    mlflow.xgboost.log_model(apac_model, "credit-risk-model")

# APAC: DagsHub shows:
# - Experiment in MLflow-compatible UI
# - Linked to Git commit (current code state)
# - Linked to DVC data version (dataset v4 used for this run)
# - Team members can view, comment, and compare in DagsHub UI
print("APAC: Run logged to DagsHub — code + data + experiment linked")

DagsHub APAC model registry promotion

# APAC: DagsHub model registry — promote trained model to production

# APAC: Register model in DagsHub model registry
dvc push  # Push model artifact to DagsHub remote storage

# APAC: Tag model version in Git (DagsHub creates registry entry automatically)
git tag -a "model-v3.2-production" -m "APAC: Credit risk model v3.2 — AUC 0.852, all APAC markets"
git push origin "model-v3.2-production"

# APAC: DagsHub model registry now shows:
# Model: apac-credit-risk-model
# Version: v3.2
# Stage: production
# Linked commit: a7f3c91
# Linked DVC data version: data/processed/features_v4.parquet
# Metrics: val_auc=0.852, val_precision=0.883
# Promoted by: [email protected]
# Promoted at: 2026-06-03 14:23 SGT

# APAC: MLOps team deploys from model registry (reproducible deployment)
# APAC: If production issues arise → DagsHub shows exactly which data + code to reproduce

APAC ML Reproducibility ROI

Problem: APAC bank's credit model unexpectedly degrades after Q4 retraining

Without ML reproducibility tools:
  Investigation time: 3-4 weeks
  - Engineers manually compare training logs (if they exist)
  - Dataset version unknown — may have changed since Q3
  - Code changes since Q3 model mixed with new changes
  - Root cause: never definitively identified
  Outcome: retrain with best guess; risk of repeating the issue

With DVC + Aim + DagsHub:
  Investigation time: 2-3 hours
  - dvc checkout HEAD~1 → restore exact Q3 dataset version
  - git checkout <Q3-commit> → restore exact Q3 code
  - Aim comparison: Q3 vs Q4 training curves side-by-side
  - Finding: Q4 training data included new demographic field
    with 28% missing values → model learned noise pattern
  - Fix: retrain with Q3 data quality standards applied to Q4 data
  Outcome: root cause identified; fix applied; recurrence prevented

Related APAC ML Infrastructure Resources

For the ML experiment tracking platforms (MLflow, W&B, Comet ML) that complement DVC data versioning with richer UI, managed cloud infrastructure, and enterprise team features — the mature cloud-native alternatives when APAC teams need production-grade MLOps without self-hosted infrastructure management — see the APAC AI tools catalog.

For the ML feature store platforms (Feast, Tecton, Hopsworks) that manage the structured training features produced from DVC-versioned datasets — serving features to training pipelines tracked in Aim or DagsHub — see the APAC feature store guide.

For the data quality platforms (Encord, SuperAnnotate, Cleanlab) that detect label errors and annotation quality issues in DVC-versioned APAC training datasets — identifying when data problems explain poor model metrics visible in Aim or DagsHub experiment tracking — see the APAC data-centric AI guide.