Arc 26.06.1: Security Fixes, Memory Leaks Closed, Faster Ingest

26.06.1 is the tenth Arc release in eight months. It is also the first one that's mostly not features.
Five security fixes, three memory leaks closed, an ingest-path optimization that recovered about 12% throughput, and one operator-facing change worth knowing up front: pprof is no longer reachable on Arc's API port by default.
If you run Arc in production, update.
Security fixes, update now
External security researcher Alex Manson (https://github.com/NeuroWinter) reported four of these against the 26.05.1 main branch. Thank you for the detailed reports.
DuckDB I/O sandbox. Any authenticated token, including tokens scoped with an empty permissions: [], could read arbitrary local files through DuckDB's I/O function family (read_csv_auto, read_json, read_text, read_blob, glob, parquet_metadata, and friends). On a deployment with auth on but RBAC not subscribed, a single line to /api/v1/query returned auth.db (bcrypt hashes), arc.toml (S3 secrets), TLS private keys, /proc/self/environ, cross-tenant Parquet files, and, when httpfs was loaded, cloud-instance-metadata IPs via SSRF. The fix is structural: at startup Arc now sets DuckDB's allowed_directories to a fixed allowlist (local storage root, temp directory, upload directory, the configured S3/Azure buckets) and flips enable_external_access = false. After that flip DuckDB refuses to open any file outside the allowlist and rejects further INSTALL/LOAD. Already-loaded extensions stay fully callable.
pprof on the public API port. Any network-reachable caller, no token required, could fetch /debug/pprof/heap (leaks live SQL strings, decoded msgpack records, decompressed bodies, cached token info derived from plaintext hashes), enumerate every goroutine's call stack, or pin a CPU core for arbitrary duration via /debug/pprof/profile?seconds=N. Trivial DoS amplification. pprof is now removed from the public API entirely. It's opt-in via ARC_DEBUG_PPROF=1, binds to 127.0.0.1:6060 by default (override with ARC_DEBUG_PPROF_ADDR), and refuses a non-loopback bind unless you also set ARC_DEBUG_PPROF_ALLOW_NON_LOOPBACK=1. Operator-facing change: curl http://arc:8000/debug/pprof/heap returns 404 starting in 26.06.1.
Cluster cache-invalidate HMAC. The post-compaction cache-invalidate endpoint was gated by a static header (X-Arc-Internal: cache-invalidate). Any network-reachable caller could spam it, forcing DuckDB's cache_httpfs to repopulate. On S3-backed deployments that's a cost-amplification surface (ListObjectsV2 calls multiply) and a latency-amplification surface (p99 spikes during repopulation). Now HMAC-SHA256 over {nonce, sender, cluster, timestamp} keyed by cluster.shared_secret, with a ±5 minute freshness window and replay protection.
Enterprise critical (X1 + X2): the WAL replication stream is now HMAC-authenticated end-to-end (https://github.com/Basekick-Labs/arc/security/advisories/GHSA-wfgr-8x84-22q7), closing a cluster-wide worm primitive a single compromised cluster credential could've used. The cluster FSM also now rejects malicious paths in manifest-registration proposals (https://github.com/Basekick-Labs/arc/security/advisories/GHSA-f85q-mvg8-qf37); pre-fix a peer could register /etc/passwd or s3://attacker/poisoned.parquet into the authoritative cluster manifest, and other nodes would fetch and serve it.
A separate finding alongside the auth work: Enterprise WAL replication was also effectively broken pre-26.06.1 due to two unrelated bugs, a sync-ack wire-format mismatch and a 30-second idle disconnect cycle. 26.06.1 is the first release where WAL replication actually works end-to-end between nodes.
Bug fixes operators will feel
A few of these have been on customer wishlists for a while.
EXTRACT(YEAR FROM time) no longer breaks. The query rewriter was matching the FROM keyword inside EXTRACT, SUBSTRING, TRIM, and OVERLAY argument lists, then rewriting the time column as a measurement table. Every InfluxDB and Telegraf migrator hit this, since time is the canonical column name across those schemas. Workarounds were YEAR(time), date_trunc('year', time), or quoting the column. 26.06.1 adds a regex pre-pass that masks FROM keywords inside these function bodies before the table-rewrite pass runs, and restores them afterward. Paren depth is tracked, so nested calls like EXTRACT(YEAR FROM CAST(t AS DATE)) are handled correctly.
Orphaned DuckDB spill files cleaned at startup. DuckDB writes query spill files (duckdb_temp_storage_*.tmp) when intermediate state for a hash group-by, sort, or join exceeds the memory limit. On graceful close DuckDB unlinks them itself. On kill -9, OOM-kill, container restart, or crash, the unlink never runs. One development machine accumulated 40 GB of orphaned spill files across 17 days; the largest single file was 8.83 GB. 26.06.1 adds a new database.temp_directory config (default ./.tmp), pins DuckDB to it, and walks the directory at startup to remove regular files matching the spill pattern that are older than 60 seconds.
Partition pruner cache leak. The partition pruner has two TTL caches (globCache at 30s, partitionCache at 60s). Their get() returned expired entries as a cache miss but did not evict them, and neither cache had a max-size cap. Under high-cardinality glob patterns or distinct (path, sql) keys (typical for satellite-telemetry dashboards), both maps grew monotonically over process lifetime. This was one observed component of a 24-hour RSS climb on a demo container that required daily restarts. 26.06.1 wires a janitor goroutine that sweeps both caches every 30 seconds. Worst-case retention is now bounded at ~2× TTL.
S3 retention and delete: RSS recovery. Customers running Arc on S3 reported container RSS climbing many GB during overnight retention and delete operations and staying there until restart. The leaked bytes lived outside Go's heap, so the existing debug.FreeOSMemory() couldn't reclaim them: DuckDB's httpfs extension caches data blocks in libduckdb's native heap, and the AWS SDK Go transport accumulated idle keep-alive connections holding HTTP/2 frame buffers. 26.06.1 bounds the AWS SDK transport (MaxIdleConns=100, MaxIdleConnsPerHost=16, IdleConnTimeout=90s) and calls glibc's malloc_trim(0) (throttled to once per 30 seconds) after every ClearHTTPCache. Measured on a controlled harness: net residue after retention dropped from +120 MB to +47 MB, and the +72 MB-of-RSS-while-idle symptom (the part customers actually noticed) is gone.
Ingest throughput: about 12% faster on line protocol
The line-protocol tokenizer was allocating a fresh []byte for every space and every comma split in every line. Replaced with sub-slice indexing: track a start index, emit data[start:i] sub-slices on delimiter hits, and let the returned slice alias the input buffer (safe because every downstream caller copies into string before storing).
| Before | After | |
|---|---|---|
ParseLine | 1547 ns / 50 allocs | 915 ns / 24 allocs |
ParseBatch (10 lines) | 9215 ns / 282 allocs | 6326 ns / 142 allocs |
End-to-end HTTP-path ingest (30-second run):
| Before | After | |
|---|---|---|
| Throughput | 4.10M rec/s | 4.63M rec/s |
| p50 latency | 2.41 ms | 2.14 ms |
| p99 latency | 8.85 ms | 8.46 ms |
About 12.9% more records per second on the line-protocol path, with lower latency at the median and tail. No config change required.
Now on Docker Hub
The release image is now published to Docker Hub alongside GitHub Container Registry, so you can pull from whichever you prefer:
# Docker Hub (new)
docker pull basekicklabs/arc:latest
# GitHub Container Registry (existing)
docker pull ghcr.io/basekick-labs/arc:latestBoth registries get the same multi-arch image (linux/amd64 + linux/arm64) on every release, tagged with the full version, the short YY.MM form, and latest.
One experiment: columnar MessagePack query response
A new endpoint, POST /api/v1/query/msgpack, streams query results as columnar MessagePack: data is an array of per-column arrays, not the row-oriented [[v,v,v], [v,v,v]] shape clients usually expect. Arc's storage, ingest, and DuckDB execution are all columnar; the query response now matches.
Head-to-head on a 393M-row measurement, p50 latency:
| Query | JSON | msgpack (columnar) | Arrow IPC |
|---|---|---|---|
LIMIT 100K | 48.1 ms | 33.2 ms | 31.0 ms |
LIMIT 500K | 173.2 ms | 81.1 ms | 61.1 ms |
LIMIT 1M | 334.2 ms | 133.6 ms | 105.4 ms |
2.49× faster than JSON on LIMIT 1M, closing 78% of the gap to Arrow IPC for clients that can't or don't want to take an Arrow dependency. Experimental. The shape may move or change based on production feedback; if it graduates we'll re-introduce a configurable row cap.
Enterprise
A few load-bearing items if you run Arc Enterprise.
- WAL replication is HMAC-authenticated end-to-end (per the security section above), and is now also actually working end-to-end after two unrelated wire-format bugs were fixed alongside the auth work.
- Cluster auth replication via Raft (Phase A + Phase A.1). Pre-26.06.1, every cluster node carried its own SQLite auth DB. A token created on the writer was not valid on the reader. A revocation on the writer left the same token valid on every reader until the affected reader process exited. 26.06.1 routes the
api_tokensand RBAC writes (organizations, teams, roles, measurement permissions, token memberships) through the cluster's Raft FSM. End-to-end: a token or revocation propagates across every node within one Raft commit (typically under 50 ms over loopback). API-call reads stay local-cache-fast. - Pattern 2 shared-storage multi-writer. Clusters using shared object storage (S3, Azure Blob, MinIO) behind a load balancer can now run multiple
RoleWriternodes concurrently. Setcluster.shared_storage_mode = trueandwriter.replicas: 3. The chart refusesreplicas=2because Raft quorum of 2 stalls on any single-pod loss; use 1 for dev or 3+ for HA. Singleton background tasks (retention, continuous queries, delete coordinator, reconciliation) now gate on the Raft leader instead of a "primary writer," and writer failover is suppressed: the load balancer handles writer-crash recovery via its own/readypoll.
The full per-PR detail on Phase A.1's secondary indices, cascade behaviour, snapshot quarantine, and the read-back retry helper is in the long-form notes.
Operator action items
- Update.
- If you scrape
/debug/pprof/*on port 8000, switch toARC_DEBUG_PPROF=1and pull from127.0.0.1:6060. - If you set
storage.local_path,database.temp_directory, orcompaction.temp_directoryto a relative path inarc.toml, check the startup log lineDuckDB external access locked down (sandbox active)for the resolved absolute path. - Multipart upload files moved from
os.TempDir()(typically/tmp) to${database.temp_directory}/arc-uploads. If you scrape/tmpfor monitoring, you'll stop seeing Arc upload files; if you size temp storage separately, plan for the move. - Enterprise:
cluster.shared_secretis now required for WAL replication. Pre-existing clusters with an empty secret will start loggingreplication sender refuses connection: cluster.shared_secret not configureduntil the secret is populated on every node.
How to update
# Docker Hub
docker pull basekicklabs/arc:latest
# or GitHub Container Registry
docker pull ghcr.io/basekick-labs/arc:latestFor binary installations, download 26.06.1 from the https://github.com/Basekick-Labs/arc/releases.
For Kubernetes:
helm install arc https://github.com/basekick-labs/arc/releases/download/v26.06.1/arc-26.06.1.tgzThe full long-form release notes (with per-fix detail, threat-model notes, the test matrix for the path-validation sandbox, and the wire-format pin tests across all four cluster HMAC endpoints) are in https://github.com/Basekick-Labs/arc/blob/release/26.06.1/RELEASE_NOTES_2026.06.1.md on the release branch.
Get started:
Questions? Discord or https://github.com/Basekick-Labs/arc/issues.