What it does

Key features

InferenceService CRD — Kubernetes-native APAC model serving manifest
Framework-agnostic — TF, PyTorch, sklearn, XGBoost, ONNX APAC support
Serverless scaling — APAC scale-to-zero with Knative integration
Canary rollouts — APAC traffic-split model version deployment
InferenceGraph — APAC multi-model pipeline orchestration
APAC storage integration — S3, GCS, Azure Blob APAC model loading

When to reach for it

Best for

APAC ML platform teams running Kubernetes — KServe's Kubernetes-native InferenceService CRD integrates with APAC existing Kubernetes tooling (Argo CD, Helm, APAC platform engineering workflows) without separate APAC serving infrastructure
APAC organizations with multi-framework APAC model portfolios — KServe's framework-agnostic APAC serving handles sklearn, PyTorch, TF, and XGBoost models from the same APAC platform; APAC single serving infrastructure for all APAC models
APAC ML platforms with intermittent APAC inference traffic — KServe's serverless scale-to-zero eliminates APAC idle GPU costs for APAC models that serve infrequent APAC inference requests without maintaining always-on APAC pods

Don't get burned

Limitations to know

! APAC Knative complexity — KServe serverless mode requires Knative Serving in the APAC cluster; APAC organizations without Knative can run KServe in RawDeployment mode but lose APAC scale-to-zero capability
! APAC ISTIO/Knative operational overhead — the full APAC KServe stack (KServe + Knative + Istio) is operationally complex; APAC platform teams new to these APAC components face significant APAC initial setup investment
! APAC LLM serving at frontier is limited — KServe's APAC serving containers for HuggingFace models work for APAC medium models but APAC very large LLMs (70B+ parameters) benefit from vLLM's APAC continuous batching and APAC PagedAttention optimizations

Context

About KServe

KServe is a CNCF open-source Kubernetes-native model inference platform that provides APAC ML platform teams a standardized InferenceService custom resource for serving any APAC ML model framework (TensorFlow, PyTorch, scikit-learn, XGBoost, ONNX, HuggingFace Transformers) on APAC Kubernetes — where APAC ML platform engineers define APAC InferenceService manifests specifying the APAC model format, storage location (APAC S3, GCS, Azure Blob), and resource requirements, and KServe handles APAC model download, serving container initialization, and APAC endpoint creation without framework-specific serving configuration.

KServe's serverless inference — where APAC ML platform teams deploy KServe on top of Knative Serving, enabling APAC InferenceServices to scale from zero replicas (no APAC inference cost when idle) to APAC target replicas within seconds as APAC requests arrive — provides APAC ML platforms cost-efficient serving for APAC models with bursty or intermittent APAC inference traffic without maintaining always-on APAC inference pods for every APAC deployed model.

KServe's APAC canary rollout — where APAC ML engineering teams deploy new APAC model versions alongside existing APAC production versions by specifying traffic weight splits in the APAC InferenceService (`canaryTrafficPercent: 10` routes 10% of APAC requests to the APAC new version), monitor APAC canary metrics, and progressively shift APAC traffic to the new APAC model version — provides APAC organizations safe APAC model updates that mirror APAC progressive delivery patterns used for APAC application deployments.

KServe's InferenceGraph — where APAC ML platform teams compose APAC inference pipelines from multiple APAC models (APAC request pre-processing model → APAC primary classifier → APAC post-processing model) or APAC ensemble strategies (route APAC requests to APAC different models based on APAC input characteristics) using KServe's InferenceGraph resource — provides APAC organizations APAC multi-model inference pipeline orchestration on APAC Kubernetes without custom APAC API gateway logic.

KServe

Key features

Best for

Limitations to know

About KServe

Where this category meets practice depth.