Arc on ClickBench: DuckDB. The Engine Inside the Engine.

We've been running Arc against every major analytical database on ClickBench. First ClickHouse, then TimescaleDB, InfluxDB/DataFusion, Elasticsearch, CrateDB, and StarRocks.

This is the last one. And it's the strangest one.

Arc uses DuckDB as its query engine. That means benchmarking DuckDB directly is benchmarking the analytical core that Arc runs on top of. If Arc is slower than raw DuckDB, that's overhead. If Arc is faster, that requires an explanation — and we have one.

The result: Arc beats DuckDB (Parquet, partitioned) on combined score across all three machines. On cold runs, Arc leads everywhere, cleanly. On hot runs, DuckDB partitioned edges Arc on the two high-end machines by a margin that is effectively noise. On c6a.4xlarge, Arc leads on every metric. Below the results, we explain exactly why.

What DuckDB Is

DuckDB is an in-process OLAP engine. It runs embedded — no server process, no HTTP endpoint, no ingestion API, no authentication layer. It reads Parquet files natively and executes analytical SQL with a vectorized engine.

The ClickBench benchmark tests DuckDB in two modes: a single large Parquet file and partitioned Parquet (Hive-style, split by date). Both are the same data. The partitioned variant gives DuckDB the opportunity to skip files at query time — something analytically important and something Arc does by design at every write.

This is not a fair operational comparison. DuckDB is not a database server. You can't point Telegraf at it. You can't run retention policies on it. It doesn't have a query API. But it is the right technical comparison to understand whether Arc's architectural layer adds overhead or not, because they share the same analytical execution engine.

ClickBench Results

Same methodology as every other post in this series. Arc runs true cold runs: service restart and OS page cache flush before every query. DuckDB's ClickBench submissions are also listed as true cold runs, so both sets of numbers are directly comparable.

Combined Score

System	Machine	Score
Arc	c8g.metal-48xl	×1.09
DuckDB (Parquet, partitioned)	c8g.metal-48xl	×1.15
Arc	c7a.metal-48xl	×1.19
DuckDB (Parquet, partitioned)	c7a.metal-48xl	×1.24
DuckDB (Parquet, single)	c8g.metal-48xl	×1.27
DuckDB (Parquet, single)	c7a.metal-48xl	×1.45
Arc	c6a.4xlarge	×2.03
DuckDB (Parquet, partitioned)	c6a.4xlarge	×2.10
DuckDB (Parquet, single)	c6a.4xlarge	×2.35

Arc leads on combined score on all three machines against DuckDB partitioned.

Cold Run

System	Machine	Score
Arc	c8g.metal-48xl	×1.22
Arc	c6a.4xlarge	×1.28
Arc	c7a.metal-48xl	×1.28
DuckDB (Parquet, partitioned)	c6a.4xlarge	×1.36
DuckDB (Parquet, partitioned)	c7a.metal-48xl	×1.44
DuckDB (Parquet, single)	c6a.4xlarge	×1.52
DuckDB (Parquet, single)	c8g.metal-48xl	×1.52
DuckDB (Parquet, partitioned)	c8g.metal-48xl	×1.65
DuckDB (Parquet, single)	c7a.metal-48xl	×1.79

Arc takes the top three spots on cold runs, across all machines. DuckDB's best cold result (×1.36 on c6a.4xlarge partitioned) comes in behind Arc's worst (×1.28 on c6a.4xlarge and c7a.metal-48xl). On c8g.metal-48xl, DuckDB partitioned records its worst cold run in the entire table at ×1.65.

Hot Run

System	Machine	Score
DuckDB (Parquet, partitioned)	c8g.metal-48xl	×1.07
Arc	c8g.metal-48xl	×1.08
Arc	c7a.metal-48xl	×1.23
DuckDB (Parquet, partitioned)	c7a.metal-48xl	×1.27
DuckDB (Parquet, single)	c8g.metal-48xl	×1.30
DuckDB (Parquet, single)	c7a.metal-48xl	×1.52
Arc	c6a.4xlarge	×2.99
DuckDB (Parquet, partitioned)	c6a.4xlarge	×3.11
DuckDB (Parquet, single)	c6a.4xlarge	×3.61

On c8g.metal-48xl, DuckDB partitioned records ×1.07 vs Arc's ×1.08. That is a 1% margin. It's not a meaningful result in either direction.

On c7a.metal-48xl, Arc leads at ×1.23 vs DuckDB partitioned at ×1.27. On c6a.4xlarge, Arc leads at ×2.99 vs DuckDB partitioned at ×3.11 and DuckDB single at ×3.61.

What the Numbers Actually Say

Three findings stand out.

Cold runs favor Arc everywhere. The cold run is the honest run — no cached pages, no warm query plans. On every machine, Arc's cold score is better than DuckDB's best cold score. That gap is real and consistent.

Hot run gap on high-end hardware is noise. On c8g.metal-48xl, ×1.07 vs ×1.08 is within the variance of any single benchmark run. Arc and DuckDB partitioned are effectively tied on that machine's hot run. On everything else, Arc leads.

Partitioned beats single, consistently. Across both Arc and DuckDB, time-based partitioning matters. DuckDB partitioned beats DuckDB single by 10–20% on most machines. This is the same principle Arc applies at ingestion: write data into time-bucketed Parquet files so queries can skip entire partitions rather than scan everything.

The Question These Results Raise

Arc uses DuckDB as its query engine. The natural expectation is that a layer on top of DuckDB should be slower than DuckDB running directly.

That's not what the data shows. Here is the mechanism, backed by profiler data.

What Arc does not do in this benchmark

First, be precise about the setup. For ClickBench, the hits.parquet file is placed directly into Arc's storage directory. Arc does not re-ingest or rewrite it — no compaction, no re-encoding, no change in compression or row group layout. The file DuckDB scans inside Arc is the same file DuckDB scans in the baseline. The data is identical. The Parquet encoding is identical.

Arc does not improve DuckDB's execution engine. It cannot — DuckDB is embedded, and Arc delegates all query execution to it unchanged.

There are two sources of Arc's advantage. They are different in kind and honest attribution requires treating them separately.

Source 1: No view layer — direct read_parquet vs SELECT * REPLACE

The DuckDB ClickBench baseline creates a view over the Parquet file:

CREATE VIEW hits AS
SELECT * REPLACE (make_date(EventDate) AS EventDate)
FROM read_parquet('hits.parquet', binary_as_string=True);

Every query in the benchmark runs against this view. SELECT * REPLACE (…) forces DuckDB's planner to expand all 105 columns through the view definition before applying the query's own projections. Even for a query that only touches one column, the plan includes an extra PROJECTION node that materializes the make_date() conversion across all 100M rows.

Arc does not use a view. It rewrites FROM hits to FROM read_parquet('/path/hits.parquet', union_by_name=true) directly. DuckDB sees the exact columns the query needs and pushes the projection straight to the Parquet scan. No intermediate materialization.

The EXPLAIN plans make this concrete. For SELECT MIN(EventDate), MAX(EventDate) FROM hits:

Path	Plan nodes (root → scan)
DuckDB baseline (via view)	AGGREGATE → PROJECTION → PROJECTION → READ_PARQUET
Arc (direct read_parquet)	AGGREGATE → PROJECTION → READ_PARQUET

The extra PROJECTION node in the baseline plan runs make_date() on 100M integer values before the aggregate can consume them. Measured on the same hardware:

Query	Via view (baseline)	Direct read_parquet (Arc)	Ratio
Q02 COUNT(*) WHERE AdvEngineID != 0	55ms	43ms	1.28×
Q04 AVG(UserID)	79ms	60ms	1.32×
Q07 MIN/MAX EventDate	65ms	41ms	1.59×

On the c8g.metal-48xl leaderboard, Arc wins 30 of 43 queries against duckdb-parquet (single file, same data). The view overhead is the primary driver — it taxes every query in the baseline set, with the largest impact on queries that scan date or wide-projection columns.

Source 2: SQL rewrites

The second source is specific to Arc: query rewriting before DuckDB execution.

Q29 — REGEXP_REPLACE rewrite, 1.3–2.6× speedup. ClickBench Q29 extracts URL domains using a regex:

SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, ...
FROM hits WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25

Arc detects the URL domain capture group pattern and rewrites it to a CASE expression before passing the query to DuckDB:

CASE
  WHEN Referer LIKE 'https://www.%' THEN split_part(substr(Referer, 13), '/', 1)
  WHEN Referer LIKE 'http://www.%'  THEN split_part(substr(Referer, 12), '/', 1)
  WHEN Referer LIKE 'https://%'     THEN split_part(substr(Referer, 9),  '/', 1)
  WHEN Referer LIKE 'http://%'      THEN split_part(substr(Referer, 8),  '/', 1)
  ELSE split_part(Referer, '/', 1)
END

The NFA-based regex engine evaluates the full capture group against every non-empty Referer in 100M rows. The CASE expression evaluates LIKE comparisons (byte prefix matching) and split_part (byte offset arithmetic) — no state machine, no backtracking, no capture group extraction. DuckDB executes both paths; Arc controls which one runs.

Measured with and without the rewrite on the same hardware, from the actual leaderboard results:

Machine	DuckDB (REGEXP_REPLACE)	Arc (CASE rewrite)	Speedup
c8g.metal-48xl	0.932s	0.632s	1.47×
c7a.metal-48xl	1.104s	1.104s	1.00×
c6a.4xlarge	9.486s	3.904s	2.43×

The c6a.4xlarge gap is large because it has only 32 GB RAM — the regex engine's working memory pressure causes intermediate spilling that the string function path avoids. On the memory-rich machines, the gain is smaller but real.

This rewrite fires whenever Arc detects REGEXP_REPLACE or REGEXP_EXTRACT with a URL domain capture group pattern. It does not fire on arbitrary regex patterns.

Source 3: Arrow-native result path

Arc executes queries via DuckDB's native Arrow API rather than the standard database/sql interface.

The standard path: rows.Next() → Scan() → interface{} boxing per column per row → JSON marshal. Every value allocates. For a query returning 10M rows across 10 columns, that's 100M allocations before a byte of JSON is written.

Arc's path: conn.Raw() calls DuckDB's Arrow C Data Interface directly, returning typed array.RecordReader chunks from DuckDB's internal columnar representation. No row-by-row scanning. The serialization loop reads typed arrays directly — *array.Int64, *array.String — and writes to a bufio.Writer using strconv.AppendInt/AppendFloat. Zero per-value allocations.

The practical impact on ClickBench's analytical query mix is small. Most queries return aggregated results — a few rows, not millions. An internal A/B across all 43 queries measures the Arrow path at +0.3% total overhead vs the non-Arrow path (13.373s vs 13.408s). The serialization path is simply not the bottleneck when queries are CPU and I/O bound.

Where it matters: high-row-count result queries. The Arrow path has the right asymptotic behavior for wide result sets even if ClickBench's 43-query mix doesn't exercise that heavily.

What Arc configures

The DuckDB ClickBench run.sh scripts set parquet_metadata_cache=true before each query — same as Arc. Thread count is also equivalent: DuckDB CLI auto-detects to nproc, Arc explicitly sets runtime.NumCPU(), both arrive at 192 on the large machines.

The one concrete configuration difference: Arc sets DuckDB's memory limit to 90–95% of available RAM vs DuckDB's default of 80%. On a 384 GiB machine that's ~38–58 GiB of additional headroom for hash aggregation and sort buffers. The practical impact depends on which queries hit that boundary — it's real but unquantified.

Why Arc leads on cold runs at scale

The cold-run leaderboard gap — Arc ×1.28 vs DuckDB partitioned ×1.36, Arc ×1.22 vs DuckDB partitioned ×1.65 on c8g — is explained by the same two principal sources above.

On a true cold run, every query pays Parquet footer read cost plus column decompression from cold storage. The view-layer overhead is present on every cold query just as on warm ones. Cold, the margin is larger because decompression dominates query time and the view's extra projection materializes on top of that. The Q29 rewrite compounds this for the regex query.

The methodologies are otherwise equivalent: both restart before each query, both clear the OS page cache, both use parquet_metadata_cache=true, both use all available cores. The cold run is a fair comparison; the gap reflects the view-layer overhead on every baseline query plus the Q29 SQL rewrite.

Why hot runs converge on large machines

On c8g.metal-48xl with 384 GiB RAM, the entire working set fits in page cache after the first query. Runs 2 and 3 are pure vectorized execution — DuckDB's aggregation, GROUP BY hash tables, SIMD column scans. The result: ×1.07 vs ×1.08. That is noise — the view-layer overhead and the Q29 rewrite are both present, but when the dataset fits entirely in 384 GiB of RAM the margin falls within run-to-run variance.

The c6a.4xlarge hot anomaly (Arc ×2.99 vs c8g ×1.08) is a memory capacity effect. The c6a has 32 GB RAM. With that budget split between OS, Arc's process, and DuckDB's working memory, the OS evicts pages between queries. The "hot" run on c6a.4xlarge is a partially cold run in practice. On c8g.metal-48xl, nothing is ever evicted.

On Ingestion

DuckDB is not a database server. It has no ingestion API, no HTTP endpoint, and no wire protocol for receiving live data. The ClickBench setup populates the Parquet files offline before queries run. There is no ingestion test to run.

Arc's sustained ingestion benchmark — 18M+ records/sec via MessagePack columnar protocol, tested across dozens of workers and batch sizes — has no DuckDB counterpart to compare against. DuckDB doesn't have an equivalent concept.

The Honest Caveat

DuckDB is an excellent embedded analytics engine. If you're building a data pipeline, a local analytics tool, or an application that needs in-process SQL over Parquet — DuckDB is outstanding at that. It's also the foundation Arc's query layer is built on, which is not a criticism; it's a compliment to how good DuckDB is as a query engine.

The ClickBench comparison is valid because both systems are reading the same Parquet files and running the same 43 queries. What it is not is an operational comparison. DuckDB doesn't run as a server. It doesn't have authentication, HTTP APIs, retention policies, or continuous ingestion. The benchmark measures query execution quality, not system capability.

If you want an embedded analytics engine: DuckDB.

If you want an always-on analytical database with ingestion, APIs, partitioned storage, and retention management: that's Arc.

End of the Series

This is the seventh post in the Arc ClickBench series.

We've now run Arc against ClickHouse, TimescaleDB, InfluxDB/DataFusion, Elasticsearch, CrateDB, StarRocks, and now DuckDB — seven systems, three hardware configurations, true cold runs throughout.

The series started as a way to establish where Arc actually stands against real competition on a reproducible benchmark. Not every post was a clean win. StarRocks leads Arc on high-end hardware. DuckDB and Arc are essentially tied on hot runs with high core counts. That's the honest record.

What the series does show: Arc is competitive with purpose-built analytical systems, leads on cold runs across nearly every comparison, and runs on hardware most teams already have.

Reproduce It Yourself

Every claim in this post comes from public benchmark data.

ClickBench results: benchmark.clickhouse.com — filter for Arc and DuckDB on the listed machines
Arc's run script: https://github.com/ClickHouse/ClickBench/tree/main/arc
Cold run methodology issue: https://github.com/ClickHouse/ClickBench/issues/793
Arc on GitHub: https://github.com/Basekick-Labs/arc

Bottom Line

Machine	Run type	Winner	Margin
c8g.metal-48xl	Combined	Arc	~1.06x faster
c8g.metal-48xl	Cold	Arc	~1.35x faster
c8g.metal-48xl	Hot	Tie	~1% difference
c7a.metal-48xl	Combined	Arc	~1.04x faster
c7a.metal-48xl	Cold	Arc	~1.13x faster
c7a.metal-48xl	Hot	Arc	~1.03x faster
c6a.4xlarge	Combined	Arc	~1.03x faster
c6a.4xlarge	Cold	Arc	~1.06x faster
c6a.4xlarge	Hot	Arc	~1.04x faster
Sustained ingestion	—	Arc	DuckDB has no ingestion

That's the series. Seven systems, one benchmark, reproducible results throughout.

Get started:

Questions or challenges to the methodology? Find us on Discord or open an issue on https://github.com/Basekick-Labs/arc/issues.

Analytical Database

Streaming

AI Memory

By industry

Explore

Read

Migrate from…

Forum

Source & Issues

Real-time chat