Skip to main content
Japan
AIMenta
w

whylogs

by WhyLabs

Lightweight ML data logging library that profiles data as compact statistical summaries for privacy-preserving drift detection and data quality monitoring across APAC ML pipelines.

AIMenta verdict
Decent fit
4/5

"ML data logging — APAC ML teams use whylogs to profile APAC training and production data distributions as statistical summaries, enabling drift detection and data quality monitoring without storing raw APAC data."

Features
6
Use cases
1
Watch outs
3
What it does

Key features

  • Statistical data profiling: compact sketch-based summaries of APAC data distributions
  • Privacy-preserving: statistical summaries only — raw APAC data never leaves environment
  • PySpark integration: distributed APAC profile computation on large datasets
  • WhyLabs platform: managed drift alerting and APAC dashboard from whylogs profiles
  • Lightweight: APAC profile computation in seconds; <1MB profile files
  • Multi-framework: pandas, Spark, and custom APAC data inputs
When to reach for it

Best for

  • APAC ML teams with data privacy or sovereignty requirements who want drift detection from statistical summaries — particularly teams using PySpark for large-scale APAC batch inference where full data storage is impractical.
Don't get burned

Limitations to know

  • ! WhyLabs managed platform required for alerting dashboards — self-hosted APAC teams must build visualization
  • ! Sketch-based approximations — some APAC statistical tests less precise than exact computation
  • ! APAC teams must implement ground truth comparison separately — whylogs tracks distributions, not model accuracy
Context

About whylogs

whylogs is a lightweight Python library (developed by WhyLabs) for logging statistical summaries of data distributions in ML pipelines — rather than storing raw APAC data samples for drift analysis, whylogs computes compact sketch-based summaries (approximate quantiles, approximate cardinality, histograms) that can be compared across APAC time periods to detect drift without privacy exposure.

The core whylogs concept is a `DatasetProfile` — a statistical summary of an APAC dataset at a point in time. Profiles are computed in seconds on full APAC datasets (millions of rows), stored as small binary files (typically <1MB), and compared using statistical tests to detect drift. APAC teams log profiles at each batch inference run and compare against the training reference profile.

whylogs integrates natively with the WhyLabs AI Observability Platform (the managed SaaS companion) — sending profiles to WhyLabs for drift alerting, anomaly detection, and APAC dashboard visualization without storing raw APAC data on WhyLabs infrastructure. This makes whylogs attractive for APAC organizations with data sovereignty requirements: only statistical summaries leave the APAC environment.

For APAC teams using PySpark for large-scale batch inference, whylogs provides a Spark integration that computes dataset profiles on APAC Spark DataFrames in distributed fashion — profiling APAC datasets too large for single-node pandas processing without sampling.

Beyond this tool

Where this category meets practice depth.

A tool only matters in context. Browse the service pillars that operationalise it, the industries where it ships, and the Asian markets where AIMenta runs adoption programs.