What it does

Key features

Multi-label: APAC toxicity/threat/obscenity/insult/identity-attack per-text scoring
CPU inference: APAC lightweight BERT-base for batch platform moderation
Three models: APAC original/unbiased/multilingual variant selection
Bias reduction: APAC unbiased model for demographic-fair toxicity detection
Single-call API: APAC toxicity scores with one Python function call
HuggingFace: APAC model available from HuggingFace Hub for pipeline integration

When to reach for it

Best for

APAC engineering teams adding automated toxic content detection to UGC platforms — particularly APAC e-commerce review systems, community forums, and customer support chat where English-language moderation is needed without GPU infrastructure, and teams that want pre-trained multi-label toxicity detection without training custom classification models.

Don't get burned

Limitations to know

! APAC no CJK language support — Chinese/Japanese/Korean toxic content requires separate language-specific classifiers
! APAC trained on Jigsaw dataset — may not generalize to APAC-specific cultural toxicity patterns
! APAC LLM-powered moderation (Llama Guard, NeMo Guardrails) provides better context understanding

Context

About Detoxify

Detoxify is an open-source Python library from Unitary AI that provides APAC engineering teams with pre-trained BERT-based models for multi-label toxic comment classification — detecting toxicity, severe toxicity, obscenity, threats, insults, and identity attacks (targeting gender, religion, race, or sexual orientation) in text with a single inference call. APAC content moderation teams and platform engineering teams use Detoxify to add automated toxic content detection to user-generated content platforms, customer review systems, community chat, and social features without building or training custom content classifiers.

Detoxify offers three model variants: Original (trained on Jigsaw's Toxic Comment Classification Challenge), Unbiased (trained with bias reduction to reduce demographic skew in toxicity predictions), and Multilingual (extending detection to multiple languages including Spanish, French, Portuguese, Turkish, Italian, and Russian — though not CJK languages). APAC teams processing English-language UGC from APAC users (English-language reviews, English chat on APAC platforms) use Detoxify Original or Unbiased; APAC teams processing multilingual content use the Multilingual model for European language coverage.

Detoxify's inference is computationally lightweight — a BERT-base model that runs efficiently on CPU for batch moderation, making it appropriate for APAC platforms that process user-generated content without GPU inference infrastructure. APAC e-commerce platforms, community forums, and customer support systems use Detoxify to flag potentially toxic content for human review rather than applying real-time LLM-powered moderation to every message.

For APAC teams needing Chinese, Japanese, or Korean toxic content detection, Detoxify's multilingual model does not cover CJK languages — APAC teams processing CJK content combine Detoxify for English moderation with language-specific CJK toxic content classifiers (fine-tuned on APAC annotation datasets). APAC platform teams building complete multilingual moderation pipelines use Detoxify as one component of a language-routing moderation stack where language detection precedes classifier selection.

Detoxify

Key features

Best for

Limitations to know

About Detoxify

Where this category meets practice depth.