What it does

Key features

Statistical data profiling: compact sketch-based summaries of APAC data distributions
Privacy-preserving: statistical summaries only — raw APAC data never leaves environment
PySpark integration: distributed APAC profile computation on large datasets
WhyLabs platform: managed drift alerting and APAC dashboard from whylogs profiles
Lightweight: APAC profile computation in seconds; <1MB profile files
Multi-framework: pandas, Spark, and custom APAC data inputs

When to reach for it

Best for

APAC ML teams with data privacy or sovereignty requirements who want drift detection from statistical summaries — particularly teams using PySpark for large-scale APAC batch inference where full data storage is impractical.

Don't get burned

Limitations to know

! WhyLabs managed platform required for alerting dashboards — self-hosted APAC teams must build visualization
! Sketch-based approximations — some APAC statistical tests less precise than exact computation
! APAC teams must implement ground truth comparison separately — whylogs tracks distributions, not model accuracy

Context

About whylogs

whylogs is a lightweight Python library (developed by WhyLabs) for logging statistical summaries of data distributions in ML pipelines — rather than storing raw APAC data samples for drift analysis, whylogs computes compact sketch-based summaries (approximate quantiles, approximate cardinality, histograms) that can be compared across APAC time periods to detect drift without privacy exposure.

The core whylogs concept is a `DatasetProfile` — a statistical summary of an APAC dataset at a point in time. Profiles are computed in seconds on full APAC datasets (millions of rows), stored as small binary files (typically <1MB), and compared using statistical tests to detect drift. APAC teams log profiles at each batch inference run and compare against the training reference profile.

whylogs integrates natively with the WhyLabs AI Observability Platform (the managed SaaS companion) — sending profiles to WhyLabs for drift alerting, anomaly detection, and APAC dashboard visualization without storing raw APAC data on WhyLabs infrastructure. This makes whylogs attractive for APAC organizations with data sovereignty requirements: only statistical summaries leave the APAC environment.

For APAC teams using PySpark for large-scale batch inference, whylogs provides a Spark integration that computes dataset profiles on APAC Spark DataFrames in distributed fashion — profiling APAC datasets too large for single-node pandas processing without sampling.

whylogs

Key features

Best for

Limitations to know

About whylogs

Where this category meets practice depth.