Skip to main content
Global
AIMenta
Blog

APAC Continuous Profiling Guide 2026: Pyroscope, Parca, and Speedscope for Production Performance

A practitioner guide for APAC SRE and performance engineering teams implementing continuous profiling as the fourth observability pillar in 2026 — covering Grafana Pyroscope for always-on CPU and memory flamegraph collection with eBPF zero-instrumentation profiling, Grafana-native correlation of profiles with metrics and traces, and self-hosted or Grafana Cloud managed deployment; Parca for self-hosted eBPF continuous profiling with Prometheus-compatible label querying, diff flamegraphs for deployment regression detection comparing before/after profiles, and CNCF-governed open-source governance; and Speedscope for browser-based interactive flamegraph visualization supporting pprof, Linux perf, Chrome DevTools, and Python cProfile formats — enabling collaborative APAC performance investigation via URL sharing without any server deployment.

AE By AIMenta Editorial Team ·

The Missing Pillar in APAC Observability

APAC engineering teams that have implemented metrics (Prometheus), logs (Loki/ELK), and traces (Jaeger/Tempo) still have a blind spot when diagnosing performance problems: they know that an APAC service is consuming 80% CPU but not which function is responsible. Continuous profiling fills this gap — the fourth observability pillar that connects high-level resource signals to specific code paths.

Without continuous profiling, APAC performance investigations require manually triggering profilers during incidents — often too late, as the condition has already resolved — or running profiling in staging environments that do not replicate APAC production load patterns.

Three tools cover the APAC continuous profiling spectrum:

Grafana Pyroscope — always-on profiling integrated with Grafana LGTM stack for correlated APAC observability.

Parca — self-hosted eBPF continuous profiling with Prometheus-native label model and diff flamegraphs.

Speedscope — browser-based flamegraph visualizer for on-demand APAC profile investigation and team sharing.


APAC Continuous Profiling Fundamentals

The four APAC observability pillars

APAC Observability Stack (complete):

Pillar 1: METRICS (Prometheus + Grafana)
  Question answered: "Is my APAC service unhealthy?"
  Signal: CPU 84%, memory 6.2GB, HTTP error rate 0.3%
  Limitation: tells you WHAT is high, not WHY

Pillar 2: LOGS (Loki / ELK)
  Question answered: "What happened in my APAC service?"
  Signal: error stacktrace at 14:32:41 SGT
  Limitation: captures events, not resource consumption

Pillar 3: TRACES (Tempo / Jaeger)
  Question answered: "Where is my APAC request slow?"
  Signal: /api/orders spans: DB 847ms, cache 3ms, auth 12ms
  Limitation: attributes latency to service, not code function

Pillar 4: PROFILES (Pyroscope / Parca) ← the gap
  Question answered: "WHICH FUNCTION is consuming APAC CPU?"
  Signal: flamegraph shows apacOrderService.buildReport() = 67% CPU
  Closes the loop: metrics identify WHAT, profiles explain WHY

APAC flamegraph interpretation basics

APAC CPU Flamegraph (reading bottom to top):

main() ──────────────────────────────────────────── 100%
└── http.HandleFunc()  ────────────────────────────  98%
    └── apacOrdersHandler() ────────────────────────  96%
        ├── apacDBQuery() ──────────────────────────  62%  ← 62% of CPU here
        │   ├── buildApacSQL() ─────────────────────  8%
        │   └── json.Marshal()  ────────────────────  54%  ← WHY is SQL slow?
        │       └── reflect.ValueOf() ──────────────  54%  ← reflection in hot path
        └── apacCacheCheck() ───────────────────────  34%
            └── redis.Get() ────────────────────────  34%

APAC Analysis:
→ json.Marshal using reflection = slow for high-volume APAC DB rows
→ Fix: pre-generate JSON or use struct tags with faster APAC JSON library
→ Expected improvement: 54% CPU reduction for apacOrdersHandler()

Grafana Pyroscope: APAC Always-On Flamegraphs

Pyroscope APAC Go service integration

// APAC: Instrument Go service with Pyroscope SDK

import (
    "github.com/grafana/pyroscope-go"
)

func apacInitProfiling() {
    pyroscope.Start(pyroscope.Config{
        ApplicationName: "apac-orders-service",

        // APAC: Pyroscope server (self-hosted or Grafana Cloud)
        ServerAddress: "http://apac-pyroscope:4040",

        // APAC: Labels for Prometheus-compatible querying
        Tags: map[string]string{
            "apac_env":     "production",
            "apac_region":  "sg",
            "apac_version": os.Getenv("APP_VERSION"),
            "apac_pod":     os.Getenv("POD_NAME"),
        },

        // APAC: Profile types to collect
        ProfileTypes: []pyroscope.ProfileType{
            pyroscope.ProfileCPU,
            pyroscope.ProfileAllocObjects,
            pyroscope.ProfileAllocSpace,
            pyroscope.ProfileInuseObjects,
            pyroscope.ProfileInuseSpace,
            pyroscope.ProfileGoroutines,
        },
    })
}

Pyroscope APAC Kubernetes deployment

# APAC: values.yaml for Pyroscope Helm chart
# helm install pyroscope grafana/pyroscope -f values.yaml

pyroscope:
  replicaCount: 2

  # APAC: Storage for profile data
  persistence:
    enabled: true
    size: 50Gi
    storageClass: "apac-standard"

  # APAC: Retention for profile data
  config: |
    storage:
      backend: filesystem
      filesystem:
        dir: /data/pyroscope

  # APAC: Grafana data source integration
  grafana-agent:
    enabled: true

# APAC: eBPF profiler (profiles all pods without SDK instrumentation)
pyroscope-ebpf:
  enabled: true
  # Profiles every process on every APAC K8s node automatically
  # No application code changes required

Pyroscope APAC Grafana correlation

APAC Grafana dashboard: correlate CPU spike with profile

14:32 SGT: Prometheus alert: apac-orders-service CPU > 80% for 5 min

APAC Investigation in Grafana:
  1. Open Grafana Explore → switch to Pyroscope data source
  2. Query: {apac_env="production", apac_region="sg"}
     → Renders flamegraph for the 14:32-14:37 SGT window
  3. Switch to "Explore Metrics" split view:
     → Left pane: Pyroscope flamegraph (WHICH function)
     → Right pane: Prometheus CPU metric (HOW HIGH)
  4. Flamegraph reveals: apacReportBuilder.generatePDF() = 73% CPU
     → APAC root cause: new report feature released 14:28 SGT
     → APAC action: add async queue for report generation
  5. Time to root cause: 4 minutes (vs 45 min without profiles)

Parca: APAC Self-Hosted eBPF Profiling

Parca APAC Kubernetes deployment

# APAC: Deploy Parca server and agent via kubectl

# Parca server (stores and serves APAC profiles)
kubectl apply -f https://github.com/parca-dev/parca/releases/latest/download/kubernetes-manifest.yaml

# Parca Agent (eBPF profiler on each APAC node)
kubectl apply -f https://github.com/parca-dev/parca-agent/releases/latest/download/kubernetes-manifest.yaml

# APAC: Parca Agent automatically profiles ALL processes on each node
# No application changes needed for APAC services
# Profiles annotated with Kubernetes labels (namespace, pod, container)

Parca APAC diff flamegraph — deployment regression

APAC Parca Diff Flamegraph — Before vs After Release 4.2.1

Selection A: 14:00-14:28 SGT (before release 4.2.1)
Selection B: 14:30-15:00 SGT (after release 4.2.1)

APAC Diff visualization (red = more CPU, green = less):
  Red (worse after release):
    apacProductService.listProducts()  ──── +34% CPU
      └── apacPriceEngine.calculate()  ──── +34% CPU
          └── apacTaxRule.applyRules()  ─── +34% CPU  ← new tax rules in release
  Green (better after release):
    apacCacheService.get()             ──── -12% CPU
      (APAC cache hit rate improved in release 4.2.1)

APAC Conclusion:
→ New APAC tax calculation rules (release 4.2.1) added 34% CPU per product list
→ Unintended APAC performance regression from tax rule expansion
→ Fix: memoize APAC tax calculations per product category (not per product)

Speedscope: APAC Collaborative Profile Visualization

Speedscope APAC profile sharing workflow

# APAC: Capture pprof from Go service and share via Speedscope

# Step 1: APAC — Capture CPU profile from production service
# (safe for production — 30-second CPU profile at low overhead)
curl -s "http://apac-orders-service:6060/debug/pprof/profile?seconds=30" \
    -o /tmp/apac-orders-cpu-$(date +%Y%m%d-%H%M).prof

# Step 2: APAC — Open in browser (local visualization, no upload)
# drag-and-drop .prof file to https://speedscope.app
# OR use self-hosted speedscope for sensitive APAC profiles

# Alternative: open local file directly
open https://speedscope.app  # then drag .prof file

# Step 3: APAC — Share with team
# speedscope.app shows URL-encoded profile in URL fragment
# APAC team members open same URL — see identical flamegraph
# No server access or profiling infrastructure needed

# APAC: Speedscope view modes
# ① Time Order:   left-to-right = chronological APAC call order
# ② Left Heavy:   tallest left bars = most APAC CPU time cumulative
# ③ Sandwich:     shows callers+callees of selected APAC function

Speedscope APAC Python profile investigation

# APAC: Profile Python service and visualize in Speedscope

import cProfile
import pstats
import io

# APAC: Profile specific function during incident
apac_profiler = cProfile.Profile()
apac_profiler.enable()

# APAC: Run the slow function
apac_generate_report(apac_customer_id, apac_date_range)

apac_profiler.disable()

# APAC: Export in pstats format for Speedscope
apac_stats = pstats.Stats(apac_profiler)
apac_stats.dump_stats('/tmp/apac-report-profile.prof')

# Open /tmp/apac-report-profile.prof in speedscope.app
# → Reveals: apacPDFRenderer.renderTable() = 78% of total time
# → APAC fix: pre-paginate table data before PDF rendering

APAC Profiling Tool Selection

APAC Profiling Need                   → Tool          → Why

APAC always-on production profiling   → Pyroscope     Grafana LGTM native;
(Grafana stack, continuous)           →               eBPF + SDK support;
                                                      APAC correlated view

APAC self-hosted Prometheus native    → Parca          eBPF no-instrument;
(Prometheus labels, diff flamegraphs) →               CNCF governed;
                                                      APAC diff view

APAC incident investigation           → Speedscope    No infra required;
(on-demand, share with APAC team)     →               URL sharing;
                                                      zero APAC cost

APAC Java / JVM profiling             → async-profiler Low overhead;
(OpenJDK, GraalVM APAC services)      →               JVM-specific; APAC
                                                      allocation profiling

APAC Python profiling                 → py-spy         Zero-overhead; APAC
(Django, FastAPI, ML inference)       →               no code changes;
                                                      speedscope compatible

Related APAC Observability Resources

For the AIOps and observability platforms (Grafana, Datadog, Prometheus, Dynatrace) that continuous profiling data integrates with for correlated APAC root cause analysis, see the APAC AIOps and observability guide.

For the tracing tools (Jaeger, Tempo, OpenTelemetry) that capture APAC request latency at the span level — complementary to profiling's function-level CPU analysis, see the APAC GitOps and IaC observability guide.

For the SLO management tools (Pyrra, Sloth, OpenSLO) that define the APAC service-level objectives that continuous profiling helps maintain by catching regressions before SLO burn rates spike, see the APAC SLO management guide.

Beyond this insight

Cross-reference our practice depth.

If this article matches your stage of thinking, the underlying capabilities ship across all six pillars, ten verticals, and nine Asian markets.

Keep reading

Related reading

Blog

APAC Computer Vision Deployment Guide 2026: Ultralytics, LandingAI, and Roboflow Inference

A practitioner guide for APAC ML and engineering teams building and deploying computer vision systems in 2026 — covering Ultralytics YOLO as the state-of-the-art real-time CV framework for training, fine-tuning, and exporting YOLO models to TensorRT, ONNX, and TFLite for APAC edge and cloud deployment with one Python API; LandingAI as a no-code visual inspection platform enabling APAC factory quality engineers to build defect detection models using active learning with 50-200 labeled images and no ML expertise, with edge deployment for on-premise factory inference; and Roboflow Inference as an open-source CV model serving engine that deploys YOLO, GroundingDINO, and SAM2 as Docker APIs with one command, with Workflows for chaining multi-model CV pipelines into single API calls for APAC engineering teams.

Blog

APAC ML Experiment Tracking and Data Versioning Guide 2026: DagsHub, Aim, and DVC

A practitioner guide for APAC data science teams implementing ML reproducibility through data versioning and experiment tracking in 2026 — covering DVC as a Git-compatible data version control tool that tracks large datasets and model artifacts in APAC cloud storage while storing lightweight metadata in Git, enabling reproducible ML pipelines with pipeline stage caching that skips unchanged preprocessing stages; DagsHub as an integrated ML project collaboration platform combining Git hosting, DVC data versioning, MLflow-compatible experiment tracking, and model registry in a GitHub-like interface; and Aim as an open-source self-hosted ML experiment tracker providing APAC regulated industry teams with complete data sovereignty over training metadata, rich run comparison, and hyperparameter visualization without cloud vendor dependency.

Blog

APAC AI Podcast Production Guide 2026: Podcastle, Cleanvoice AI, and Alitu

A practitioner guide for APAC thought leaders, corporate communicators, and content teams launching AI-assisted podcast production workflows in 2026 — covering Podcastle as an AI podcast recording platform with remote multi-track recording for distributed APAC guest networks, AI audio enhancement for non-studio recordings, and transcript-based text editing that removes audio mistakes by deleting transcript text; Cleanvoice AI as a specialized audio cleanup service that automatically removes filler words, mouth noises, dead air, and stutters from APAC podcast recordings via API, with a case study showing 54 hours of editor time saved on 12 back episodes; and Alitu as an all-in-one podcast production and hosting platform where non-technical APAC creators record, clean, assemble, and publish to Apple Podcasts and Spotify in under 90 minutes total without audio engineering knowledge.

Want this applied to your firm?

We use these frameworks daily in client engagements. Let's see what they look like for your stage and market.