Migrating from TimescaleDB to Arc: Your Hypertables Deserve Better

#Arc#TimescaleDB#migration#CSV#tutorial#Telegraf#SQL#DuckDB#PostgreSQL
Cover image for Migrating from TimescaleDB to Arc: Your Hypertables Deserve Better

I have a lot of respect for TimescaleDB. Taking PostgreSQL — the most trusted database in the world — and bolting on time-series capabilities was a smart move. Hypertables, columnar compression, continuous aggregates — they built real features on top of a real foundation.

But here's the thing: it's still PostgreSQL underneath. Every write goes through the heap, the WAL, the chunk management layer. Every query goes through the PostgreSQL planner. And when you're ingesting millions of records per second, that foundation becomes a ceiling.

We benchmarked Arc against TimescaleDB on ClickBench last week. 4.6x faster on combined score. 6.5x faster on hot runs. 9.6x faster on ingestion. Same hardware.

If those numbers caught your attention and you're wondering what a migration looks like — it's a \COPY and a curl. This tutorial walks through the whole thing.

The Migration Strategy

Same approach as any production migration: don't flip a switch. Run both systems in parallel.

  1. Start writing to Arc now — point Telegraf at Arc using our dedicated output plugin. Keep writing to TimescaleDB too. Telegraf supports multiple outputs — use both.
  2. Migrate historical data — export hypertables as CSV, import into Arc via the CSV endpoint. Do this at your own pace. One table at a time, one night at a time.
  3. Verify — compare row counts, time ranges, query results between both systems.
  4. Cut over — once you trust Arc, remove the TimescaleDB pipeline. Swap the Grafana datasource. Done.

TimescaleDB stays live the entire time. If something doesn't look right, you haven't lost anything.

Before You Start

You'll need:

  • A TimescaleDB instance with data you want to move
  • Arc running (v26.02.2 or later — that's when the CSV import endpoint shipped)
  • psql and curl

Start Arc if you haven't:

docker run -d -p 8000:8000 \
  -e STORAGE_BACKEND=local \
  -v arc-data:/app/data \
  ghcr.io/basekick-labs/arc:latest

No PostgreSQL to configure. No extensions to install. No shared_buffers to tune.

Step 1: Export from TimescaleDB

TimescaleDB is PostgreSQL, so you export with \COPY. One hypertable at a time:

psql -h localhost -U postgres -d mydb -c \
  "\COPY (SELECT * FROM cpu ORDER BY time) TO 'cpu.csv' WITH CSV HEADER"

Do this for each hypertable you want to migrate:

psql -h localhost -U postgres -d mydb -c \
  "\COPY (SELECT * FROM memory ORDER BY time) TO 'memory.csv' WITH CSV HEADER"
 
psql -h localhost -U postgres -d mydb -c \
  "\COPY (SELECT * FROM disk ORDER BY time) TO 'disk.csv' WITH CSV HEADER"

Export a Specific Time Range

Don't need the full history? Filter it:

psql -h localhost -U postgres -d mydb -c \
  "\COPY (SELECT * FROM cpu WHERE time > '2025-01-01' ORDER BY time) TO 'cpu_2025.csv' WITH CSV HEADER"

Compress for Faster Transfers

Arc auto-detects gzip. No flags needed:

gzip cpu.csv

How Big Are My Hypertables?

Before you start exporting, check what you're dealing with:

SELECT hypertable_name,
       num_chunks,
       pg_size_pretty(hypertable_size(format('%I.%I', hypertable_schema, hypertable_name)::regclass)) as size
FROM timescaledb_information.hypertables;

This tells you how many chunks each hypertable has and how much disk space it uses. Plan your export accordingly.

Step 2: Import into Arc

One command per table:

curl -X POST "http://localhost:8000/api/v1/import/csv?measurement=cpu&time_column=time&time_format=epoch_s" \
  -H "Authorization: Bearer $ARC_TOKEN" \
  -H "X-Arc-Database: mydb" \
  -F "file=@cpu.csv"

Or with a gzipped file:

curl -X POST "http://localhost:8000/api/v1/import/csv?measurement=cpu&time_column=time&time_format=epoch_s" \
  -H "Authorization: Bearer $ARC_TOKEN" \
  -H "X-Arc-Database: mydb" \
  -F "file=@cpu.csv.gz"

Each hypertable becomes a measurement (table) in Arc. The CSV headers become columns. Arc handles the schema automatically.

Time Format Options

TimescaleDB timestamps can be in different formats depending on your schema. Tell Arc what to expect:

ParameterFormat
epoch_sUnix timestamp in seconds
epoch_msUnix timestamp in milliseconds
epoch_usUnix timestamp in microseconds
epoch_nsUnix timestamp in nanoseconds

If your TimescaleDB column is a TIMESTAMPTZ and exports as something like 2025-06-15 14:30:00+00, Arc's auto-detection should handle it. If not, convert to epoch in your export query:

psql -h localhost -U postgres -d mydb -c \
  "\COPY (SELECT EXTRACT(EPOCH FROM time)::bigint AS time, host, usage_idle, usage_system FROM cpu ORDER BY time) TO 'cpu.csv' WITH CSV HEADER"

Batch It

Got a lot of hypertables? Script it:

for table in cpu memory disk network; do
  echo "Exporting $table..."
  psql -h localhost -U postgres -d mydb -c \
    "\COPY (SELECT * FROM $table ORDER BY time) TO '${table}.csv' WITH CSV HEADER"
 
  echo "Importing $table into Arc..."
  curl -X POST "http://localhost:8000/api/v1/import/csv?measurement=${table}&time_column=time&time_format=epoch_s" \
    -H "Authorization: Bearer $ARC_TOKEN" \
    -H "X-Arc-Database: mydb" \
    -F "file=@${table}.csv"
done

Export, import, next table. Let it run overnight if you need to.

Prefer Python?

If you'd rather script the migration in Python — or if you need to transform data on the way in — the Arc Python SDK lets you read from TimescaleDB with psycopg2 or SQLAlchemy, transform with pandas or polars, and write directly to Arc at millions of records per second via MessagePack. No CSV files, no curl.

Step 3: Verify

Check that everything landed:

curl -X POST "http://localhost:8000/api/v1/query" \
  -H "Authorization: Bearer $ARC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "database": "mydb",
    "sql": "SELECT COUNT(*) as total_rows, MIN(time) as earliest, MAX(time) as latest FROM cpu"
  }'

Compare with your TimescaleDB source:

SELECT COUNT(*), MIN(time), MAX(time) FROM cpu;

Numbers match? Time range looks right? Historical data is done.

Step 4: Point Telegraf at Arc

If you followed the migration strategy, you've been dual-writing. Now drop the TimescaleDB output.

Arc has a dedicated Telegraf output plugin — [[outputs.arc]] — available since Telegraf 1.33. MessagePack columnar format with gzip compression.

During migration — write to both:

# Keep TimescaleDB while migrating historical data
[[outputs.postgresql]]
  connection = "host=localhost user=postgres dbname=mydb sslmode=disable"
 
# New data goes to Arc too
[[outputs.arc]]
  url = "http://arc:8000/api/v1/write/msgpack"
  api_key = "$ARC_TOKEN"
  content_encoding = "gzip"
  database = "mydb"

After migration — Arc only:

[[outputs.arc]]
  url = "http://arc:8000/api/v1/write/msgpack"
  api_key = "$ARC_TOKEN"
  content_encoding = "gzip"
  database = "mydb"

Four lines. No connection strings, no SSL modes, no database users.

See our Telegraf output plugin tutorial for the full setup.

Step 5: Translate Your Queries

Here's the good news: TimescaleDB is SQL. Arc is SQL. DuckDB SQL is PostgreSQL-compatible. Most of your queries work with minimal changes — or no changes at all.

What Just Works

These TimescaleDB patterns port directly to Arc:

-- This works in both TimescaleDB and Arc
SELECT
  time_bucket(INTERVAL '5 minutes', time) AS bucket,
  host,
  AVG(usage_idle) AS avg_idle
FROM cpu
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket, host
ORDER BY bucket

time_bucket() is available in DuckDB SQL. Same syntax, same behavior. If your dashboards are built on time_bucket(), they just work.

Standard aggregations, WHERE clauses, GROUP BY, ORDER BY, LIMIT, window functions, CTEs, subqueries, CASE expressions — all the same.

What Needs Minor Changes

first() and last() (TimescaleDB extensions):

-- TimescaleDB
SELECT device, last(value, time) FROM sensors GROUP BY device
 
-- Arc (DuckDB SQL)
SELECT DISTINCT ON (device)
  device,
  value,
  time
FROM sensors
ORDER BY device, time DESC

Continuous aggregates → Continuous queries:

TimescaleDB continuous aggregates are materialized views that auto-refresh. Arc has continuous queries that do the same thing — run on a schedule, materialize results into a target table.

The SQL inside is nearly identical. The wrapper changes from CREATE MATERIALIZED VIEW ... WITH (timescaledb.continuous) to Arc's continuous query API.

Hypertable-specific functions:

TimescaleDBArc equivalent
time_bucket()time_bucket() (same)
first(value, time)DISTINCT ON + ORDER BY
last(value, time)DISTINCT ON + ORDER BY DESC
time_bucket_gapfill()time_bucket() + generate series
interpolate()Window functions
hypertable_size()Not needed — check Parquet files directly
chunks_detailed_size()Not needed — standard file system

The bottom line: if your SQL is standard PostgreSQL, it works. If it uses TimescaleDB-specific functions, there's a DuckDB SQL equivalent.

Percentiles

-- TimescaleDB
SELECT percentile_cont(0.95) WITHIN GROUP (ORDER BY response_time) FROM requests
 
-- Arc (same syntax!)
SELECT percentile_cont(0.95) WITHIN GROUP (ORDER BY response_time) FROM requests

Yeah. Same thing.

Step 6: Switch Grafana

Arc has its own Grafana datasource plugin — https://github.com/Basekick-Labs/grafana-arc-datasource. If you're coming from the PostgreSQL datasource in Grafana (which is how most people connect to TimescaleDB), the switch is straightforward: install the Arc plugin, point it at Arc, and your SQL panel queries carry over with minimal changes.

We recently shipped v1.1.0 with 850x data reduction and query splitting. See the Grafana integration tutorial for the full setup.

What You Get After Migration

TimescaleDBArc
StoragePostgreSQL heap + WALParquet files
CompressionTimescaleDB columnstore (~2x)Parquet columnar (3-5x)
Query enginePostgreSQL plannerDuckDB (analytical-optimized)
SQLPostgreSQL SQLDuckDB SQL (PostgreSQL-compatible)
PartitioningHypertables (automatic chunks)Time-based Parquet partitions
Data portabilitypg_dumpDirect Parquet file access
Operational overheadPostgreSQL + extensionSingle binary

Your data moves from PostgreSQL heap storage to portable Parquet files. Query them with Arc, DuckDB, Spark, Polars, or anything that reads Parquet. No more pg_dump as your only export path.

And the performance difference isn't subtle. On the same hardware, Arc is 4.6x faster on ClickBench combined score and 9.6x faster on ingestion.

The Checklist

  1. Dual-write — Add [[outputs.arc]] to Telegraf alongside TimescaleDB
  2. Export\COPY hypertables as CSV files
  3. Importcurl -X POST to Arc's /api/v1/import/csv
  4. Verify — Row counts and time ranges match across both systems
  5. Queries — Most SQL ports directly, translate TimescaleDB-specific functions
  6. Grafana — Install Arc datasource, update panel queries
  7. Cut over — Remove TimescaleDB output from Telegraf
  8. Done — Shut down TimescaleDB when you're confident

The Honest Bit

TimescaleDB is a great product for teams that need PostgreSQL compatibility with transactional guarantees and row-level UPDATEs. If your workload depends on those, keep using it.

But if you're running append-heavy analytical workloads — ingestion pipelines, observability, IoT telemetry, product analytics — you're paying the PostgreSQL tax on every write and every query. Arc was built for exactly these workloads, and the numbers back it up.

Need Help?

Most migrations are a few \COPY commands and a config change. If you hit something unusual — complex continuous aggregates, custom compression policies, or multi-node setups — we'll help.

Get started with Arc →


Questions about the migration? Open an issue on GitHub or find us on Discord. We've done this before.

Ready to handle billion-record workloads?

Deploy Arc in minutes. Own your data in Parquet. Use for analytics, observability, AI, IoT, or data warehousing.

Get Started ->