Why APAC Platform Teams Need SLO-as-Code Tooling
APAC engineering teams operating Kubernetes-deployed microservices face a consistent alerting problem: Prometheus alert rules that fire on raw error rates produce too many false positives, while rules tuned to reduce noise miss real incidents until significant APAC user impact has occurred. The root cause is alerting on error rate rather than error budget burn rate.
Google's SRE Workbook describes a multi-burn-rate alerting approach — where Prometheus fires alerts when the APAC error budget is being consumed faster than sustainable across multiple time windows (1h fast burn, 6h medium burn, 3d slow burn) — that dramatically reduces APAC alert noise while catching incidents that consume meaningful error budget. The challenge: writing correct multi-burn-rate PromQL expressions by hand is complex and error-prone.
Three tools address the APAC SLO-as-code problem:
Pyrra — Kubernetes CRD-native SLO management. Define APAC SLOs as ServiceLevelObjective manifests, Pyrra generates the Prometheus rules automatically.
Sloth — Multi-mode SLO rule generator. CLI, Kubernetes CRD controller, or GitHub Actions — APAC teams choose the integration model that fits their workflow.
OpenSLO — Vendor-neutral SLO specification. Write APAC SLO definitions once, convert to Prometheus, Datadog, or Dynatrace without rewriting specs.
The Multi-Burn-Rate SLO Problem These Tools Solve
Why raw error rate alerting fails APAC teams
APAC SLO: 99.9% of payment API requests succeed over a 30-day window
Error budget: 0.1% × 30 days × 24h × 60m = 43.2 minutes of downtime
Raw error rate alerting problem:
Alert: "error rate > 1% for 5 minutes"
→ Fires on every transient APAC spike (→ alert fatigue)
→ Silent when 0.5% error rate runs for 6 hours (→ 180 minutes budget burned undetected)
Multi-burn-rate alerting (Google SRE Workbook):
Fast page: error rate > 14.4× budget consumption rate for 1h (→ page, 2% budget burned)
Slow page: error rate > 6× budget consumption rate for 6h (→ page, 5% budget burned)
Ticket: error rate > 3× budget consumption rate for 3d (→ ticket, 10% budget burned)
Writing these PromQL expressions by hand for 30+ APAC services:
→ Complex, error-prone, ~200 lines of PromQL per service
→ Pyrra/Sloth generate this correctly from a 15-line APAC SLO definition
Pyrra: Kubernetes CRD-Native SLO Management
ServiceLevelObjective CRD definition
# pyrra-apac-payments-slo.yaml — define APAC SLO as Kubernetes CRD
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: apac-payments-api-availability
namespace: apac-payments
labels:
pyrra.dev/team: platform
apac.region: sea
spec:
target: "99.9" # 99.9% availability SLO
window: 30d # 30-day rolling window
# SLI: APAC payment API request success rate
servicelevels:
requests/availability:
total:
metric: http_requests_total{job="apac-payments-api"}
errors:
metric: http_requests_total{job="apac-payments-api",status=~"5.."}
After applying this CRD, Pyrra generates:
# Pyrra auto-generates this PrometheusRule (do NOT write by hand):
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: pyrra-apac-payments-api-availability
spec:
groups:
- name: pyrra-apac-payments-api-availability
rules:
# Recording rules for APAC error budget consumption rates
- record: pyrra:apac_payments_api:error_ratio:5m
expr: |
sum(rate(http_requests_total{job="apac-payments-api",status=~"5.."}[5m]))
/
sum(rate(http_requests_total{job="apac-payments-api"}[5m]))
# Fast APAC burn rate alert (1h window × 14.4× burn rate)
- alert: ApacPaymentsApiFastBurn
expr: |
(
pyrra:apac_payments_api:error_ratio:5m > (14.4 * (1 - 0.999))
and
pyrra:apac_payments_api:error_ratio:1h > (14.4 * (1 - 0.999))
)
labels:
severity: page
apac_team: platform
annotations:
summary: "APAC payments API fast burn — error budget depleting rapidly"
# Slow APAC burn rate alert (6h window × 6× burn rate)
- alert: ApacPaymentsApiSlowBurn
expr: |
(
pyrra:apac_payments_api:error_ratio:30m > (6 * (1 - 0.999))
and
pyrra:apac_payments_api:error_ratio:6h > (6 * (1 - 0.999))
)
labels:
severity: page
apac_team: platform
Pyrra GitOps integration
# Pyrra Helm install — APAC Kubernetes cluster
helm repo add pyrra https://pyrra-dev.github.io/pyrra
helm repo update
helm install pyrra pyrra/pyrra \
--namespace monitoring \
--set apiServer.enabled=true \
--set prometheusUrl=http://kube-prometheus-stack-prometheus.monitoring:9090
# Commit APAC SLO CRDs to git — ArgoCD/Flux syncs them
git add k8s/apac-slos/
git commit -m "feat: add APAC payments API 99.9% availability SLO"
git push
# Argo CD reconciles → Pyrra watches CRD → generates PrometheusRules automatically
Pyrra dashboard
Pyrra's web UI (kubectl port-forward svc/pyrra-api 9099:9099 -n monitoring) shows:
APAC Service SLO Status Dashboard:
Service SLO Compliance Budget Remaining Burn Rate
apac-payments-api 99.9% 99.95% 87.3% 0.6×
apac-kyc-service 99.5% 99.8% 92.1% 0.3×
apac-notification-service 99.0% 98.7% ⚠ 31.2% 2.1× (slow burn)
apac-fraud-detection-api 99.9% 100% 100% 0.0×
Sloth: Multi-Mode SLO Rule Generation
Sloth YAML SLO definition
# apac-slos.yaml — Sloth SLO specification (simpler than raw PromQL)
version: "prometheus/v1"
service: apac-payments-api
labels:
apac_team: platform
apac_region: sea
slos:
- name: apac-payments-availability
description: "APAC payment API request availability SLO"
objective: 99.9 # 99.9% target
sli:
events:
error_query: sum(rate(http_requests_total{job="apac-payments-api",status=~"5.."}[{{.window}}]))
total_query: sum(rate(http_requests_total{job="apac-payments-api"}[{{.window}}]))
alerting:
name: ApacPaymentsApiAvailability
labels:
apac_severity: platform
annotations:
runbook: https://runbooks.apac-company.internal/apac-payments-availability
page_alert:
labels:
severity: page
ticket_alert:
labels:
severity: warning
Sloth deployment modes — choose what fits your APAC workflow
CLI mode (APAC GitOps: generate-and-commit):
# Install Sloth CLI
go install github.com/slok/sloth/cmd/sloth@latest
# Generate Prometheus rules from APAC SLO YAML
sloth generate -i apac-slos.yaml -o prometheus-rules/apac-payments-slos.yaml
# Commit generated APAC Prometheus rules to git
git add prometheus-rules/apac-payments-slos.yaml
git commit -m "feat: regenerate APAC payments SLO Prometheus rules"
# Argo CD/Flux picks up APAC generated rules automatically
GitHub Actions mode (APAC CI/CD pipeline generation):
# .github/workflows/apac-slo-generate.yml
name: Generate APAC SLO Prometheus Rules
on:
push:
paths:
- 'apac-slos/**'
jobs:
generate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Sloth
run: |
curl -L https://github.com/slok/sloth/releases/download/v0.11.0/sloth-linux-amd64 \
-o /usr/local/bin/sloth && chmod +x /usr/local/bin/sloth
- name: Generate APAC Prometheus rules from SLO YAML
run: |
for f in apac-slos/*.yaml; do
sloth generate -i "$f" -o "prometheus-rules/$(basename $f)"
done
- name: Commit generated APAC SLO rules
run: |
git config user.name "APAC SLO Generator"
git add prometheus-rules/
git diff --staged --quiet || git commit -m "chore: regenerate APAC SLO Prometheus rules"
git push
Kubernetes CRD mode (APAC controller-based):
# PrometheusServiceLevel CRD — Sloth controller watches and generates PrometheusRules
apiVersion: sloth.dev/v1
kind: PrometheusServiceLevel
metadata:
name: apac-payments-slos
namespace: monitoring
spec:
service: apac-payments-api
slos:
- name: apac-payments-availability
objective: 99.9
sli:
events:
errorQuery: sum(rate(http_requests_total{job="apac-payments-api",status=~"5.."}[{{.window}}]))
totalQuery: sum(rate(http_requests_total{job="apac-payments-api"}[{{.window}}]))
alerting:
name: ApacPaymentsApiSLO
pageAlert:
labels:
severity: page
ticketAlert:
labels:
severity: warning
VictoriaMetrics support (APAC enterprise deployments)
# Sloth generates vmrule output for APAC VictoriaMetrics deployments
sloth generate \
-i apac-slos.yaml \
--out-sl-name apac-vm-slos \
--extra-labels apac_region=sea \
-o apac-vm-rules.yaml \
--vm # VictoriaMetrics vmrule format instead of PrometheusRule
OpenSLO: Vendor-Neutral SLO Portability
The APAC observability migration problem OpenSLO solves
APAC Scenario: Platform team migrates from Prometheus to Datadog
Without OpenSLO:
Year 1: Write Sloth/Pyrra SLO definitions for 40 APAC services
Year 2: Migrate to Datadog (corporate mandate)
→ Rewrite all 40 APAC SLO definitions in Datadog SLO API format
→ 3-4 weeks of APAC SLO migration work
→ Risk: APAC SLO targets drift during rewrite
With OpenSLO:
Year 1: Write OpenSLO SLO definitions for 40 APAC services
Use openslo→sloth converter for Prometheus implementation
Year 2: Migrate to Datadog
→ Run openslo→datadog converter on existing APAC SLO YAML
→ APAC SLO targets identical, implementation changes
→ 1-2 days of APAC converter configuration
OpenSLO YAML specification
# apac-payments-slo.openslo.yaml — vendor-neutral SLO definition
apiVersion: openslo/v1
kind: SLO
metadata:
name: apac-payments-api-availability
displayName: "APAC Payments API Availability"
spec:
service: apac-payments-api
description: "99.9% of APAC payment API requests succeed within 30 days"
budgetingMethod: Occurrences # or Timeslices
objectives:
- displayName: Availability
target: 0.999 # 99.9%
indicator:
metadata:
name: apac-payments-request-availability
spec:
ratioMetric:
counter: true
good:
metricSource:
type: Prometheus
spec:
query: sum(rate(http_requests_total{job="apac-payments-api",status!~"5.."}[{{.window}}]))
total:
metricSource:
type: Prometheus
spec:
query: sum(rate(http_requests_total{job="apac-payments-api"}[{{.window}}]))
timeWindow:
- duration: 30d
isRolling: true
OpenSLO converter ecosystem
# Convert APAC OpenSLO YAML → Sloth format → Prometheus rules
pip install openslo-lib
openslo convert \
--from openslo/v1 \
--to sloth/v1 \
apac-payments-slo.openslo.yaml \
> apac-payments-slo.sloth.yaml
sloth generate -i apac-payments-slo.sloth.yaml -o prometheus-rules/
# Convert APAC OpenSLO YAML → Datadog SLO API (when migrating platforms)
openslo convert \
--from openslo/v1 \
--to datadog \
apac-payments-slo.openslo.yaml \
> apac-payments-slo.datadog.json
APAC SLO Tool Selection
APAC SLO Problem → Tool → Why
APAC Kubernetes-native teams → Pyrra CRD model: SLO definitions live
using Prometheus Operator → in git alongside K8s manifests
APAC SLO-as-code across multiple → Sloth CLI + CRD + GitHub Actions:
deployment models → APAC teams choose integration style
APAC teams not yet on Kubernetes → Sloth CLI Standalone binary; no Kubernetes
(bare Prometheus deployment) → controller required for APAC use
APAC VictoriaMetrics deployments → Sloth vmrule output natively supported;
(enterprise APAC FinServ common) → Pyrra is Prometheus-only
APAC teams anticipating platform → OpenSLO SLO definitions portable across
migration (Prometheus → Datadog) → backends via APAC converters
APAC multi-backend observability → OpenSLO Common APAC SLO language across
(teams on different APAC backends) → mixed Prometheus + Datadog stacks
APAC teams new to SLO alerting → Pyrra Dashboard provides cross-service
wanting SLO status visibility → APAC SLO status without custom dashboards
Related APAC Platform Engineering Resources
For the observability backends that store the Prometheus metrics these SLO tools measure against, see the APAC AIOps guide covering Dynatrace, PagerDuty, and Datadog.
For the feature flag tools that gate rollouts when SLO error budgets are low, see the APAC feature flag guide covering OpenFeature, Flagsmith, and Unleash.
For the API testing tools that validate services before SLO windows are opened, see the APAC API testing guide covering Hoppscotch, Bruno, and k6.
Beyond this insight
Cross-reference our practice depth.
If this article matches your stage of thinking, the underlying capabilities ship across all six pillars, ten verticals, and nine Asian markets.