Skip to main content
Global
AIMenta
Blog

APAC Continuous Profiling Guide 2026: Pyroscope, Parca, and Speedscope for Production Performance

A practitioner guide for APAC SRE and performance engineering teams implementing continuous profiling as the fourth observability pillar in 2026 — covering Grafana Pyroscope for always-on CPU and memory flamegraph collection with eBPF zero-instrumentation profiling, Grafana-native correlation of profiles with metrics and traces, and self-hosted or Grafana Cloud managed deployment; Parca for self-hosted eBPF continuous profiling with Prometheus-compatible label querying, diff flamegraphs for deployment regression detection comparing before/after profiles, and CNCF-governed open-source governance; and Speedscope for browser-based interactive flamegraph visualization supporting pprof, Linux perf, Chrome DevTools, and Python cProfile formats — enabling collaborative APAC performance investigation via URL sharing without any server deployment.

AE By AIMenta Editorial Team ·

The Missing Pillar in APAC Observability

APAC engineering teams that have implemented metrics (Prometheus), logs (Loki/ELK), and traces (Jaeger/Tempo) still have a blind spot when diagnosing performance problems: they know that an APAC service is consuming 80% CPU but not which function is responsible. Continuous profiling fills this gap — the fourth observability pillar that connects high-level resource signals to specific code paths.

Without continuous profiling, APAC performance investigations require manually triggering profilers during incidents — often too late, as the condition has already resolved — or running profiling in staging environments that do not replicate APAC production load patterns.

Three tools cover the APAC continuous profiling spectrum:

Grafana Pyroscope — always-on profiling integrated with Grafana LGTM stack for correlated APAC observability.

Parca — self-hosted eBPF continuous profiling with Prometheus-native label model and diff flamegraphs.

Speedscope — browser-based flamegraph visualizer for on-demand APAC profile investigation and team sharing.


APAC Continuous Profiling Fundamentals

The four APAC observability pillars

APAC Observability Stack (complete):

Pillar 1: METRICS (Prometheus + Grafana)
  Question answered: "Is my APAC service unhealthy?"
  Signal: CPU 84%, memory 6.2GB, HTTP error rate 0.3%
  Limitation: tells you WHAT is high, not WHY

Pillar 2: LOGS (Loki / ELK)
  Question answered: "What happened in my APAC service?"
  Signal: error stacktrace at 14:32:41 SGT
  Limitation: captures events, not resource consumption

Pillar 3: TRACES (Tempo / Jaeger)
  Question answered: "Where is my APAC request slow?"
  Signal: /api/orders spans: DB 847ms, cache 3ms, auth 12ms
  Limitation: attributes latency to service, not code function

Pillar 4: PROFILES (Pyroscope / Parca) ← the gap
  Question answered: "WHICH FUNCTION is consuming APAC CPU?"
  Signal: flamegraph shows apacOrderService.buildReport() = 67% CPU
  Closes the loop: metrics identify WHAT, profiles explain WHY

APAC flamegraph interpretation basics

APAC CPU Flamegraph (reading bottom to top):

main() ──────────────────────────────────────────── 100%
└── http.HandleFunc()  ────────────────────────────  98%
    └── apacOrdersHandler() ────────────────────────  96%
        ├── apacDBQuery() ──────────────────────────  62%  ← 62% of CPU here
        │   ├── buildApacSQL() ─────────────────────  8%
        │   └── json.Marshal()  ────────────────────  54%  ← WHY is SQL slow?
        │       └── reflect.ValueOf() ──────────────  54%  ← reflection in hot path
        └── apacCacheCheck() ───────────────────────  34%
            └── redis.Get() ────────────────────────  34%

APAC Analysis:
→ json.Marshal using reflection = slow for high-volume APAC DB rows
→ Fix: pre-generate JSON or use struct tags with faster APAC JSON library
→ Expected improvement: 54% CPU reduction for apacOrdersHandler()

Grafana Pyroscope: APAC Always-On Flamegraphs

Pyroscope APAC Go service integration

// APAC: Instrument Go service with Pyroscope SDK

import (
    "github.com/grafana/pyroscope-go"
)

func apacInitProfiling() {
    pyroscope.Start(pyroscope.Config{
        ApplicationName: "apac-orders-service",

        // APAC: Pyroscope server (self-hosted or Grafana Cloud)
        ServerAddress: "http://apac-pyroscope:4040",

        // APAC: Labels for Prometheus-compatible querying
        Tags: map[string]string{
            "apac_env":     "production",
            "apac_region":  "sg",
            "apac_version": os.Getenv("APP_VERSION"),
            "apac_pod":     os.Getenv("POD_NAME"),
        },

        // APAC: Profile types to collect
        ProfileTypes: []pyroscope.ProfileType{
            pyroscope.ProfileCPU,
            pyroscope.ProfileAllocObjects,
            pyroscope.ProfileAllocSpace,
            pyroscope.ProfileInuseObjects,
            pyroscope.ProfileInuseSpace,
            pyroscope.ProfileGoroutines,
        },
    })
}

Pyroscope APAC Kubernetes deployment

# APAC: values.yaml for Pyroscope Helm chart
# helm install pyroscope grafana/pyroscope -f values.yaml

pyroscope:
  replicaCount: 2

  # APAC: Storage for profile data
  persistence:
    enabled: true
    size: 50Gi
    storageClass: "apac-standard"

  # APAC: Retention for profile data
  config: |
    storage:
      backend: filesystem
      filesystem:
        dir: /data/pyroscope

  # APAC: Grafana data source integration
  grafana-agent:
    enabled: true

# APAC: eBPF profiler (profiles all pods without SDK instrumentation)
pyroscope-ebpf:
  enabled: true
  # Profiles every process on every APAC K8s node automatically
  # No application code changes required

Pyroscope APAC Grafana correlation

APAC Grafana dashboard: correlate CPU spike with profile

14:32 SGT: Prometheus alert: apac-orders-service CPU > 80% for 5 min

APAC Investigation in Grafana:
  1. Open Grafana Explore → switch to Pyroscope data source
  2. Query: {apac_env="production", apac_region="sg"}
     → Renders flamegraph for the 14:32-14:37 SGT window
  3. Switch to "Explore Metrics" split view:
     → Left pane: Pyroscope flamegraph (WHICH function)
     → Right pane: Prometheus CPU metric (HOW HIGH)
  4. Flamegraph reveals: apacReportBuilder.generatePDF() = 73% CPU
     → APAC root cause: new report feature released 14:28 SGT
     → APAC action: add async queue for report generation
  5. Time to root cause: 4 minutes (vs 45 min without profiles)

Parca: APAC Self-Hosted eBPF Profiling

Parca APAC Kubernetes deployment

# APAC: Deploy Parca server and agent via kubectl

# Parca server (stores and serves APAC profiles)
kubectl apply -f https://github.com/parca-dev/parca/releases/latest/download/kubernetes-manifest.yaml

# Parca Agent (eBPF profiler on each APAC node)
kubectl apply -f https://github.com/parca-dev/parca-agent/releases/latest/download/kubernetes-manifest.yaml

# APAC: Parca Agent automatically profiles ALL processes on each node
# No application changes needed for APAC services
# Profiles annotated with Kubernetes labels (namespace, pod, container)

Parca APAC diff flamegraph — deployment regression

APAC Parca Diff Flamegraph — Before vs After Release 4.2.1

Selection A: 14:00-14:28 SGT (before release 4.2.1)
Selection B: 14:30-15:00 SGT (after release 4.2.1)

APAC Diff visualization (red = more CPU, green = less):
  Red (worse after release):
    apacProductService.listProducts()  ──── +34% CPU
      └── apacPriceEngine.calculate()  ──── +34% CPU
          └── apacTaxRule.applyRules()  ─── +34% CPU  ← new tax rules in release
  Green (better after release):
    apacCacheService.get()             ──── -12% CPU
      (APAC cache hit rate improved in release 4.2.1)

APAC Conclusion:
→ New APAC tax calculation rules (release 4.2.1) added 34% CPU per product list
→ Unintended APAC performance regression from tax rule expansion
→ Fix: memoize APAC tax calculations per product category (not per product)

Speedscope: APAC Collaborative Profile Visualization

Speedscope APAC profile sharing workflow

# APAC: Capture pprof from Go service and share via Speedscope

# Step 1: APAC — Capture CPU profile from production service
# (safe for production — 30-second CPU profile at low overhead)
curl -s "http://apac-orders-service:6060/debug/pprof/profile?seconds=30" \
    -o /tmp/apac-orders-cpu-$(date +%Y%m%d-%H%M).prof

# Step 2: APAC — Open in browser (local visualization, no upload)
# drag-and-drop .prof file to https://speedscope.app
# OR use self-hosted speedscope for sensitive APAC profiles

# Alternative: open local file directly
open https://speedscope.app  # then drag .prof file

# Step 3: APAC — Share with team
# speedscope.app shows URL-encoded profile in URL fragment
# APAC team members open same URL — see identical flamegraph
# No server access or profiling infrastructure needed

# APAC: Speedscope view modes
# ① Time Order:   left-to-right = chronological APAC call order
# ② Left Heavy:   tallest left bars = most APAC CPU time cumulative
# ③ Sandwich:     shows callers+callees of selected APAC function

Speedscope APAC Python profile investigation

# APAC: Profile Python service and visualize in Speedscope

import cProfile
import pstats
import io

# APAC: Profile specific function during incident
apac_profiler = cProfile.Profile()
apac_profiler.enable()

# APAC: Run the slow function
apac_generate_report(apac_customer_id, apac_date_range)

apac_profiler.disable()

# APAC: Export in pstats format for Speedscope
apac_stats = pstats.Stats(apac_profiler)
apac_stats.dump_stats('/tmp/apac-report-profile.prof')

# Open /tmp/apac-report-profile.prof in speedscope.app
# → Reveals: apacPDFRenderer.renderTable() = 78% of total time
# → APAC fix: pre-paginate table data before PDF rendering

APAC Profiling Tool Selection

APAC Profiling Need                   → Tool          → Why

APAC always-on production profiling   → Pyroscope     Grafana LGTM native;
(Grafana stack, continuous)           →               eBPF + SDK support;
                                                      APAC correlated view

APAC self-hosted Prometheus native    → Parca          eBPF no-instrument;
(Prometheus labels, diff flamegraphs) →               CNCF governed;
                                                      APAC diff view

APAC incident investigation           → Speedscope    No infra required;
(on-demand, share with APAC team)     →               URL sharing;
                                                      zero APAC cost

APAC Java / JVM profiling             → async-profiler Low overhead;
(OpenJDK, GraalVM APAC services)      →               JVM-specific; APAC
                                                      allocation profiling

APAC Python profiling                 → py-spy         Zero-overhead; APAC
(Django, FastAPI, ML inference)       →               no code changes;
                                                      speedscope compatible

Related APAC Observability Resources

For the AIOps and observability platforms (Grafana, Datadog, Prometheus, Dynatrace) that continuous profiling data integrates with for correlated APAC root cause analysis, see the APAC AIOps and observability guide.

For the tracing tools (Jaeger, Tempo, OpenTelemetry) that capture APAC request latency at the span level — complementary to profiling's function-level CPU analysis, see the APAC GitOps and IaC observability guide.

For the SLO management tools (Pyrra, Sloth, OpenSLO) that define the APAC service-level objectives that continuous profiling helps maintain by catching regressions before SLO burn rates spike, see the APAC SLO management guide.

Beyond this insight

Cross-reference our practice depth.

If this article matches your stage of thinking, the underlying capabilities ship across all six pillars, ten verticals, and nine Asian markets.

Keep reading

Related reading

Want this applied to your firm?

We use these frameworks daily in client engagements. Let's see what they look like for your stage and market.