What it does

Key features

In-process execution — no server required; runs inside Python, R, and other processes for APAC notebook analytics
Direct Parquet/CSV/JSON query — SQL queries on local and S3 files without data loading or ETL
Vectorised execution — SIMD-accelerated columnar processing significantly faster than pandas for APAC aggregations
Python/pandas integration — zero-copy exchange with pandas DataFrames and Arrow Tables
Full SQL support — window functions, CTEs, lateral joins, and nested data types for APAC analytical queries
MotherDuck — managed DuckDB cloud service for APAC teams wanting shared analytical databases
Extension ecosystem — JSON, spatial, httpfs, delta, and iceberg extensions for APAC data lake formats

When to reach for it

Best for

APAC data engineers and analysts doing exploratory analysis on Parquet/CSV files in Python notebooks without a database cluster
Engineering teams building lightweight APAC ETL scripts that process data lake files with SQL without Spark overhead
APAC data science teams wanting SQL query performance on local data without pandas memory constraints
Data engineers testing and developing APAC analytical queries locally before running them in production data warehouses

Don't get burned

Limitations to know

! DuckDB is single-node — it does not distribute queries across multiple machines; APAC datasets that exceed single-machine memory require Spark or distributed OLAP systems
! DuckDB is not designed for concurrent writes — multi-user write workloads have limited concurrency; APAC applications requiring many concurrent writers should use a traditional database
! DuckDB's in-process model means query failures crash the host process — APAC applications using DuckDB must handle DuckDB exceptions at the application layer
! MotherDuck (managed DuckDB cloud) is relatively early stage — APAC enterprises with production SLA requirements should evaluate maturity and APAC region availability

Context

About DuckDB

DuckDB is an open-source in-process OLAP database that provides APAC analytics engineers and data scientists with SQL analytical query capability directly within Python, R, and other application processes — without a separate database server, network connection, or cluster setup — enabling fast analytical queries on local files (Parquet, CSV, JSON, Arrow), in-memory DataFrames, and cloud storage (S3, GCS, Azure Blob) with the full expressiveness of SQL.

DuckDB's in-process architecture — where the DuckDB engine runs inside the Python process (or any process) as a library, with no client-server communication overhead — makes it uniquely suited for APAC data analysis workflows where the data and computation coexist: a Jupyter notebook analysis of a 10GB Parquet file on local disk or S3 runs DuckDB queries directly without loading the full dataset into pandas memory or connecting to a remote database.

DuckDB's columnar vectorised query engine — which processes data in batches of column vectors (SIMD-accelerated operations on 1024-row blocks) rather than row-by-row — delivers analytical query performance that dramatically exceeds pandas operations on the same data. An APAC data analyst computing a GROUP BY aggregation over 100M rows of Parquet data with DuckDB completes in seconds; the equivalent pandas groupby loads the full dataset into memory and may take minutes or OOM.

DuckDB's direct Parquet query capability — where `SELECT * FROM 'path/to/file.parquet'` or `SELECT * FROM 's3://bucket/prefix/*.parquet'` queries Parquet files without data loading — makes DuckDB the interactive query layer for APAC data lake exploration. APAC data engineers exploring Hive-partitioned Parquet datasets in S3 use DuckDB to query specific partitions, sample data, and compute summaries without copying data to a database or running a Spark cluster.

DuckDB's deep Python integration — where DuckDB queries can directly read from and write to pandas DataFrames and Arrow Tables without data copying, and where DuckDB can register Python objects as queryable tables — enables APAC data engineering teams to use SQL and DataFrame APIs interchangeably in the same analysis workflow, choosing whichever is more expressive for each operation.

DuckDB

Key features

Best for

Limitations to know

About DuckDB

Where this category meets practice depth.