Migrating from InfluxDB to Arc: One Endpoint, One Command

#Arc#InfluxDB#migration#Line Protocol#tutorial#Telegraf#SQL#DuckDB
Cover image for Migrating from InfluxDB to Arc: One Endpoint, One Command

I spent years at InfluxData. I know the product inside out — the good parts, the rough edges, the things that make you stare at a Flux query for twenty minutes wondering why it's doing what it's doing. I also know that a lot of teams are stuck. Stuck on InfluxDB 1.x with TSM files they can't move. Stuck on 2.x writing Flux queries against a language that InfluxData themselves deprecated. Stuck wondering if 3.x is the answer or just another migration they'll have to do again in two years.

So when people ask me how hard it is to migrate from InfluxDB to Arc, I try not to smile too hard. It's a curl command.

Arc has a dedicated Line Protocol import endpoint — POST /api/v1/import/lp. Export your data from InfluxDB, POST it to Arc, done. Measurements become tables. Tags become columns. Fields become columns. No schema definition, no ETL pipeline, no Python script held together with hope.

The smart way to do this: start writing new data to both InfluxDB and Arc at the same time. While new data flows into Arc, you migrate historical data at your own pace. Once everything is in Arc and you've verified it, you cut over. No downtime, no data gaps, no "hold your breath and flip the switch" moment.

This tutorial walks through the whole thing.

The Migration Strategy

Don't do a hard cutover. Run both systems in parallel.

  1. Start writing to Arc now — point Telegraf at Arc using our dedicated output plugin. Keep writing to InfluxDB too. Telegraf supports multiple outputs — use both.
  2. Migrate historical data — export from InfluxDB, import into Arc via the LP endpoint. Do this at your own pace. Nights, weekends, measurement by measurement — doesn't matter.
  3. Verify — compare row counts, time ranges, query results between both systems.
  4. Cut over — once you trust Arc, remove the InfluxDB output from Telegraf. Swap the Grafana datasource. Done.

This way, if something doesn't look right, InfluxDB is still there with all your data. No risk.

Before You Start

You'll need:

  • An InfluxDB instance (1.x, 2.x, or 3.x) with data you want to move
  • Arc running (v26.02.1 or later — that's when the LP import endpoint shipped)
  • curl and the influx CLI

Start Arc if you haven't:

docker run -d -p 8000:8000 \
  -e STORAGE_BACKEND=local \
  -v arc-data:/app/data \
  ghcr.io/basekick-labs/arc:latest

One container. No config files. No extensions. No tuning postgresql.conf for an hour.

Step 1: Export from InfluxDB

Get your data out as Line Protocol files. The method depends on which version of InfluxDB you're running.

InfluxDB 1.x

Use influx_inspect to dump everything:

influx_inspect export -datadir /var/lib/influxdb/data \
  -waldir /var/lib/influxdb/wal \
  -out influxdb_export.lp \
  -database mydb

Or export a specific measurement:

influx -database mydb -execute "SELECT * FROM cpu" -precision ns > cpu.lp

InfluxDB 2.x

Flux. I know. Bear with me — this is the last time you'll have to write it.

influx query 'from(bucket: "my-bucket")
  |> range(start: -365d)
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")' \
  --raw > influxdb_export.lp

Or narrow it down to one measurement:

influx query 'from(bucket: "my-bucket")
  |> range(start: -365d)
  |> filter(fn: (r) => r._measurement == "cpu")
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")' \
  --raw > cpu_export.lp

InfluxDB 3.x (Core / Enterprise)

InfluxDB 3 speaks SQL now (thanks to DataFusion). Export is cleaner:

influxdb3 query --database mydb \
  "SELECT * FROM cpu" \
  --format lp > cpu_export.lp

Compress It

Arc auto-detects gzip. No flags, no headers — just send the .gz file:

gzip influxdb_export.lp

Faster upload, same result.

Step 2: Import into Arc

Here's the migration command everyone keeps asking about:

curl -X POST "http://localhost:8000/api/v1/import/lp" \
  -H "Authorization: Bearer $ARC_TOKEN" \
  -H "X-Arc-Database: mydb" \
  -F "file=@influxdb_export.lp"

Gzipped? Same thing:

curl -X POST "http://localhost:8000/api/v1/import/lp" \
  -H "Authorization: Bearer $ARC_TOKEN" \
  -H "X-Arc-Database: mydb" \
  -F "file=@influxdb_export.lp.gz"

Arc responds with what it did:

{
  "database": "mydb",
  "measurements": ["cpu", "mem", "disk"],
  "rows_imported": 4523891,
  "precision": "ns",
  "duration_ms": 1293
}

4.5 million rows in 1.3 seconds. Your data is in Arc, stored as Parquet. That's it.

Options Worth Knowing

Precision — If your timestamps aren't nanoseconds, tell Arc:

curl -X POST "http://localhost:8000/api/v1/import/lp?precision=ms" \
  -H "Authorization: Bearer $ARC_TOKEN" \
  -H "X-Arc-Database: mydb" \
  -F "file=@influxdb_export.lp"

Supported: ns (default), us, ms, s.

Large exports — The endpoint takes files up to 500MB. Got more? Split and loop:

split -l 10000000 influxdb_export.lp chunk_
for f in chunk_*; do
  curl -X POST "http://localhost:8000/api/v1/import/lp" \
    -H "Authorization: Bearer $ARC_TOKEN" \
    -H "X-Arc-Database: mydb" \
    -F "file=@$f"
done

Multi-measurement files — Arc handles them. Each measurement becomes its own table. No need to split by measurement — just throw the whole export at it.

Prefer Python? — The Arc Python SDK lets you query InfluxDB, transform with pandas or polars, and write directly to Arc at millions of records per second via MessagePack. No LP files, no curl.

Step 3: Verify

Sanity check. Make sure everything landed:

curl -X POST "http://localhost:8000/api/v1/query" \
  -H "Authorization: Bearer $ARC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "database": "mydb",
    "sql": "SELECT COUNT(*) as total_rows, MIN(time) as earliest, MAX(time) as latest FROM cpu"
  }'

Row count matches? Time range looks right? Good. Historical data is done.

Step 4: Point Telegraf at Arc

If you followed the migration strategy above, you've already been dual-writing. Now it's time to drop the InfluxDB output.

Arc has a dedicated Telegraf output plugin — [[outputs.arc]] — available since Telegraf 1.33. It uses MessagePack columnar format with gzip compression, which is faster than Line Protocol over HTTP.

During migration — write to both:

# Keep InfluxDB while migrating historical data
[[outputs.influxdb_v2]]
  urls = ["http://influxdb:8086"]
  token = "$INFLUX_TOKEN"
  organization = "my-org"
  bucket = "my-bucket"
 
# New data goes to Arc too
[[outputs.arc]]
  url = "http://arc:8000/api/v1/write/msgpack"
  api_key = "$ARC_TOKEN"
  content_encoding = "gzip"
  database = "mydb"

After migration — Arc only:

[[outputs.arc]]
  url = "http://arc:8000/api/v1/write/msgpack"
  api_key = "$ARC_TOKEN"
  content_encoding = "gzip"
  database = "mydb"

Four lines. No more organization fields, no more token-bucket-org dance. Just a URL, a key, and a database name.

See our Telegraf output plugin tutorial for the full setup — including Docker, Systemd, and 300+ input plugin examples.

Step 5: Translate Your Queries

This is the part that takes actual work. InfluxDB uses InfluxQL (1.x) or Flux (2.x, now deprecated). Arc uses DuckDB SQL — standard analytical SQL with PostgreSQL-compatible syntax.

The good news: if you know SQL at all, you already know DuckDB SQL. And if you've been writing Flux, you're about to feel a wave of relief.

InfluxQL → DuckDB SQL

Time-bucketed aggregation:

-- InfluxQL
SELECT mean("usage_idle") FROM "cpu" WHERE time > now() - 1h GROUP BY time(5m)
 
-- DuckDB SQL
SELECT
  time_bucket(INTERVAL '5 minutes', time) AS bucket,
  AVG(usage_idle) AS mean_usage_idle
FROM cpu
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket
ORDER BY bucket

Filter by tag:

-- InfluxQL
SELECT * FROM "cpu" WHERE "host" = 'server01' AND time > now() - 24h
 
-- DuckDB SQL
SELECT * FROM cpu
WHERE host = 'server01'
  AND time > NOW() - INTERVAL '24 hours'
ORDER BY time DESC

Group by tag:

-- InfluxQL
SELECT mean("usage_idle") FROM "cpu" WHERE time > now() - 1h GROUP BY "host"
 
-- DuckDB SQL
SELECT
  host,
  AVG(usage_idle) AS mean_usage_idle
FROM cpu
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY host

Last value per group:

-- InfluxQL
SELECT last("value") FROM "temperature" GROUP BY "device"
 
-- DuckDB SQL
SELECT DISTINCT ON (device)
  device,
  value,
  time
FROM temperature
ORDER BY device, time DESC

Flux → DuckDB SQL

If you've been writing Flux, I'm sorry. Here's the exit.

Basic query:

// Flux
from(bucket: "my-bucket")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "cpu")
  |> filter(fn: (r) => r._field == "usage_idle")
  |> mean()
-- DuckDB SQL
SELECT AVG(usage_idle) AS mean_usage_idle
FROM cpu
WHERE time > NOW() - INTERVAL '1 hour'

Five lines of pipe operators become three lines of SQL. Standard SQL that every developer on your team can read.

Windowed aggregation:

// Flux
from(bucket: "my-bucket")
  |> range(start: -24h)
  |> filter(fn: (r) => r._measurement == "cpu")
  |> aggregateWindow(every: 1h, fn: mean)
-- DuckDB SQL
SELECT
  time_bucket(INTERVAL '1 hour', time) AS bucket,
  AVG(usage_idle) AS mean_usage_idle
FROM cpu
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY bucket
ORDER BY bucket

Percentiles:

// Flux
from(bucket: "my-bucket")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "response_time")
  |> quantile(q: 0.95)
-- DuckDB SQL
SELECT
  percentile_cont(0.95) WITHIN GROUP (ORDER BY value) AS p95
FROM response_time
WHERE time > NOW() - INTERVAL '1 hour'

Step 6: Switch Grafana

Arc has its own Grafana datasource plugin — https://github.com/Basekick-Labs/grafana-arc-datasource. It connects directly to Arc, supports time-series macros ($__timeFilter(), $__timeFrom(), $__timeTo(), $__interval), has a query editor with syntax highlighting, and uses Apache Arrow for fast data transfer.

Swap your InfluxDB datasource for the Arc one, translate your panel queries using the SQL examples above, and your dashboards come back to life.

We recently shipped v1.1.0 with 850x data reduction and query splitting for large time ranges. See the Grafana integration tutorial for the full setup walkthrough.

What You Get After Migration

InfluxDBArc
Storage formatTSM (1.x/2.x) or Parquet (3.x)Parquet
Query languageInfluxQL / Flux (deprecated) / SQL (3.x)DuckDB SQL
Data portabilityExport via APIDirect Parquet file access
IngestionLine ProtocolLine Protocol + MessagePack
CompressionLZ4/Snappy (TSM)Parquet columnar (3-5x)

Your data moves from proprietary TSM (or managed Parquet) to standard Parquet files on storage you control. Query them with Arc, DuckDB, Spark, Polars, or any tool that reads Parquet. If Arc disappears tomorrow, your data is still there in a format the entire ecosystem understands.

No lock-in. No deprecated query languages. No wondering what happens next.

The Checklist

  1. Dual-write — Add [[outputs.arc]] to Telegraf alongside InfluxDB
  2. Export — Dump historical InfluxDB data as Line Protocol files
  3. Importcurl -X POST to Arc's /api/v1/import/lp
  4. Verify — Row counts and time ranges match across both systems
  5. Queries — InfluxQL/Flux → DuckDB SQL
  6. Grafana — Install Arc datasource, translate panel queries
  7. Cut over — Remove InfluxDB output from Telegraf
  8. Done — Shut down InfluxDB when you're confident

Need Help?

Most migrations are straightforward — export, import, swap the configs. If you hit something weird — deeply nested Flux pipelines, unusual retention setups, cardinality edge cases — we'll help.

Get started with Arc →


Running into issues during migration? Open an issue on GitHub or find us on Discord. We've done this before.

Ready to handle billion-record workloads?

Deploy Arc in minutes. Own your data in Parquet. Use for analytics, observability, AI, IoT, or data warehousing.

Get Started ->