Continuous Queries by Industry: Finance, Manufacturing, Aerospace, IoT, Defense

Most people meet continuous queries as a downsampling trick: take per-second data, write hourly averages, query the small table. That's the first 10% of what they do.
A continuous query in Arc is a SQL query that runs on a schedule, reads from a source measurement, and writes pre-computed results to a destination measurement. The query is standard DuckDB SQL, the same engine that powers Arc's ad-hoc analytics. That means a CQ can pre-compute anything DuckDB can express: financial OHLC candles, statistical process-control limits, regression-based trend detection, ASOF-aligned multi-sensor streams, Shannon entropy, cross-signal correlation. On a schedule. Materialized to a cheap table your dashboards hit in milliseconds.
This post is five deep dives, one each for Finance, Manufacturing, Aerospace, IoT, and Defense, with a schema, an ingestion path, the typical query, and the crazy one. If you want the mechanics of how CQs track time windows and stay crash-safe, that's the mechanics deep-dive. This one is about what you can actually compute.
The 30-second recap
You create a CQ with one API call:
curl -X POST http://localhost:8000/api/v1/continuous_queries \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "trades_ohlc_1m",
"database": "markets",
"source_measurement": "trades",
"destination_measurement": "trades_ohlc_1m",
"query": "SELECT ... FROM {database}.{source_measurement} WHERE time >= {start_time} AND time < {end_time} GROUP BY 1, 2",
"interval": "1m",
"is_active": true
}'Four placeholders get filled in at execution time. {database} and {source_measurement} resolve to the actual Parquet paths (local disk, S3, or Azure). {start_time} and {end_time} define the window being processed. Arc tracks last_processed_time so each run picks up exactly where the last left off, with no gaps and no overlap.
In Arc OSS you trigger execution manually (great for backfills); automatic scheduling on an interval is an Enterprise feature. The minimum interval is 10 seconds, and each execution has a 10-minute timeout. That's the whole surface. Everything interesting lives in the query field, so let's fill it with something other than AVG.
Finance: OHLC candles and VWAP
Raw market data is a firehose of individual trades. Nobody charts individual trades; they chart candles: open, high, low, close per interval, plus volume and a volume-weighted average price (VWAP). Computing those on read, over millions of ticks, every time a chart loads, is the wrong place to spend CPU. Pre-compute them.
Schema and ingestion
A trade is a measurement with the symbol as a tag and price/size as fields:
curl -X POST http://localhost:8000/api/v1/write/line-protocol \
-H "Authorization: Bearer $TOKEN" \
-H "x-arc-database: markets" \
-H "Content-Type: text/plain" \
-d 'trades,symbol=AAPL,venue=XNAS price=187.42,size=100 1719300000000000000
trades,symbol=AAPL,venue=XNAS price=187.45,size=250 1719300000350000000
trades,symbol=MSFT,venue=XNAS price=412.10,size=80 1719300000900000000'The target database travels in the x-arc-database header, so the line-protocol body stays clean. For a real feed you'd use the MessagePack endpoint (/api/v1/write/msgpack) for throughput, but line protocol shows the shape clearly.
The typical query: 1-minute candles
This is the bread and butter. Open and close are the first and last trade in the bucket by time, exactly what ARG_MIN and ARG_MAX give you (the value of one column at the row where another is min/max):
SELECT
date_trunc('minute', epoch_us(time)) AS time,
symbol,
ARG_MIN(price, time) AS open,
MAX(price) AS high,
MIN(price) AS low,
ARG_MAX(price, time) AS close,
SUM(size) AS volume,
SUM(price * size) / SUM(size) AS vwap,
COUNT(*) AS trade_count
FROM {database}.{source_measurement}
WHERE time >= {start_time} AND time < {end_time}
GROUP BY 1, 2Drop that into the query field with "interval": "1m" and a destination of trades_ohlc_1m. Your charting layer now reads a table that's already candles. ARG_MIN/ARG_MAX are the trick: they pick open and close by timestamp in a single pass, no window function, no self-join.
The crazy one: realized volatility and tiered candles
Two moves take this past the basics.
First, realized volatility belongs in the same rollup. Once you have per-minute closes, the standard deviation of log-returns over a trailing window is a volatility estimate. You compute the log-returns first in a CTE, then take their rolling standard deviation. DuckDB won't let you nest one window function inside another, so the two steps are separated cleanly. Run this as a second-tier CQ whose source is trades_ohlc_1m:
WITH returns AS (
SELECT
time,
symbol,
close,
ln(close / LAG(close) OVER (PARTITION BY symbol ORDER BY time)) AS log_return
FROM {database}.{source_measurement}
WHERE time >= {start_time} AND time < {end_time}
)
SELECT
time,
symbol,
close,
STDDEV(log_return) OVER (
PARTITION BY symbol ORDER BY time
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS realized_vol_30m
FROM returnsSecond, tier the candles. A CQ's source can be another CQ's destination, so you build a cascade: raw trades → trades_ohlc_1m → trades_ohlc_1h → trades_ohlc_1d. Each tier re-aggregates the one below it (open = first of the period, close = last, high/low = max/min, volume = sum). One-minute candles for intraday, daily candles for the multi-year chart, all maintained automatically, each on its own interval and retention.
Manufacturing: SPC control limits and Cpk
A production line emits a measured dimension thousands of times a shift: a bore diameter, a fill weight, a torque. Quality engineering doesn't want the raw stream; it wants statistical process control: the mean, the spread, and the control limits (mean ± 3σ) that say whether the process is in control. That's a textbook CQ.
Schema and ingestion
Tag by line and station; the measured value is a field. The MessagePack endpoint is the right call for a busy line. Here's the columnar shape via the Python SDK:
from arc_tsdb_client import ArcClient
client = ArcClient(url="http://localhost:8000", token=TOKEN, database="factory")
client.write_columnar(
measurement="process_metrics",
tags={"line": ["L1", "L1", "L1"], "station": ["torque", "torque", "torque"]},
fields={"value": [12.04, 11.98, 12.11], "temp_c": [41.2, 41.3, 41.1]},
timestamps=[1719300000000000000, 1719300001000000000, 1719300002000000000],
)The typical query: control limits per station
Bucket by 5 minutes, compute the mean and standard deviation, and derive the upper and lower control limits in the same SELECT:
SELECT
time_bucket(INTERVAL '5 minutes', epoch_us(time)) AS time,
line,
station,
AVG(value) AS mean,
STDDEV(value) AS sigma,
AVG(value) + 3 * STDDEV(value) AS ucl,
AVG(value) - 3 * STDDEV(value) AS lcl,
COUNT(*) AS samples
FROM {database}.{source_measurement}
WHERE time >= {start_time} AND time < {end_time}
GROUP BY 1, 2, 3The control chart your quality team stares at is now a direct SELECT * from spc_5m, limits already computed.
The crazy one: process capability and root-cause correlation
The number QA actually reports is Cpk, process capability against the engineering tolerance. Given a spec of, say, 12.00 ± 0.15, Cpk is the distance from the mean to the nearest spec limit, divided by 3σ. That's pure arithmetic over the same aggregates, so fold it into the rollup:
SELECT
date_trunc('hour', epoch_us(time)) AS time,
line,
station,
AVG(value) AS mean,
STDDEV(value) AS sigma,
LEAST(
(12.15 - AVG(value)) / (3 * STDDEV(value)),
(AVG(value) - 11.85) / (3 * STDDEV(value))
) AS cpk,
CORR(temp_c, value) AS temp_value_corr
FROM {database}.{source_measurement}
WHERE time >= {start_time} AND time < {end_time}
GROUP BY 1, 2, 3That last column, CORR(temp_c, value), is the one that earns its keep. It pre-computes the correlation between ambient temperature and the measured dimension per station, per hour. When a station starts drifting, you don't go hunting; the materialized table already tells you whether temperature is moving with it. HISTOGRAM(value) in the same query gives you the distribution shape (bimodal? skewed?) for free.
Aerospace: rate-of-change and ASOF stream alignment
Flight and platform telemetry is high-rate, multi-channel, and rarely sampled on the same clock. The engine bus, the airframe IMU, and the avionics each report on their own cadence. The hard part isn't volume; Arc's storage story for that is covered in the aerospace storage piece and the satellite telemetry post. The hard part is computing derived quantities across streams that don't line up. CQs do that work once, ahead of time.
Schema and ingestion
curl -X POST http://localhost:8000/api/v1/write/line-protocol \
-H "Authorization: Bearer $TOKEN" \
-H "x-arc-database: fleet" \
-H "Content-Type: text/plain" \
-d 'airframe,tail=N773BK altitude_ft=31000,airspeed_kt=448 1719300000000000000
engine,tail=N773BK n1_pct=92.4,egt_c=611 1719300000120000000'Two measurements, two cadences, same aircraft.
The typical query: rate of climb
Rate of climb is the derivative of altitude: the change between consecutive samples divided by the time between them. LAG reaches back one row; the arithmetic is the slope:
SELECT
date_trunc('second', epoch_us(time)) AS time,
tail,
altitude_ft,
(altitude_ft - LAG(altitude_ft) OVER w)
/ (epoch(time) - epoch(LAG(time) OVER w)) * 60 AS climb_rate_fpm
FROM {database}.{source_measurement}
WHERE time >= {start_time} AND time < {end_time}
WINDOW w AS (PARTITION BY tail ORDER BY time)That materializes vertical speed in feet-per-minute without the avionics ever sending it.
The crazy one: ASOF join across cadences
Here's the operation that's hard to do cheaply on the read path. You want, for every airframe sample, the most recent engine reading as of that moment, even though the two streams tick on different clocks. That's an ASOF join, and because a CQ runs the full DuckDB dialect, you can express it and materialize the aligned result:
SELECT
a.time,
a.tail,
a.altitude_ft,
a.airspeed_kt,
e.n1_pct,
e.egt_c
FROM {database}.{source_measurement} a
ASOF JOIN fleet.engine e
ON a.tail = e.tail AND a.time >= e.time
WHERE a.time >= {start_time} AND a.time < {end_time}
AND e.time >= {start_time} - INTERVAL '5 minutes'A CQ has one source_measurement; here it's airframe, resolved via the placeholder to its Parquet path. The second stream (engine) is referenced explicitly by its qualified database.measurement name, which a CQ requires (the x-arc-database header that lets ad-hoc queries drop the prefix doesn't apply to scheduled CQ execution). Note the e.time predicate: without it, every run would rescan the entire engine history. Bounding it to the window (plus a small look-back so the first airframe sample can still find its preceding engine reading) keeps each execution incremental.
ASOF joins are a DuckDB feature, not an Arc-specific extension, but because Arc runs queries through DuckDB, anything DuckDB supports is fair game inside a CQ. The same is true for the
ENTROPYandREGR_*functions later in this post.
The destination measurement is now a single, time-aligned table with airframe and engine state on every row. Cross-stream analysis ("what was EGT doing at this altitude and airspeed?") becomes a flat scan instead of a join over two firehoses. You paid the alignment cost once, on a schedule.
IoT: predictive-maintenance bands and drift detection
Industrial IoT is where downsampling started, so it's the obvious case, which is exactly why it's worth showing what's past obvious. The goal isn't a smaller table; it's catching a bearing before it fails.
Schema and ingestion
Vibration and temperature per device, tagged by device and zone. Telegraf with the native Arc output writes straight to the MessagePack endpoint, but the shape is the same:
sensors,device=pump_07,zone=north vibration_mm_s=2.1,temp_c=58.4 1719300000000000000
The typical query: anomaly bands with worst-actor
A 5-minute rollup that carries its own anomaly band (current value against the trailing mean and standard deviation) and names the single worst device per zone via ARG_MAX:
SELECT
time_bucket(INTERVAL '5 minutes', epoch_us(time)) AS time,
zone,
AVG(vibration_mm_s) AS avg_vib,
STDDEV(vibration_mm_s) AS sigma_vib,
MAX(vibration_mm_s) AS peak_vib,
ARG_MAX(device, vibration_mm_s) AS worst_device,
COUNT(*) AS samples
FROM {database}.{source_measurement}
WHERE time >= {start_time} AND time < {end_time}
GROUP BY 1, 2worst_device is the column on-call actually wants: not "vibration is high in the north zone," but "pump_07 is the one."
The crazy one: regression-based drift detection
A failing bearing doesn't spike; it drifts. The signal is the slope of vibration over time trending upward across hours. REGR_SLOPE fits a line to (time, value) and hands you that slope directly. Run it as an hourly CQ:
SELECT
date_trunc('hour', epoch_us(time)) AS time,
device,
zone,
AVG(vibration_mm_s) AS avg_vib,
REGR_SLOPE(vibration_mm_s, epoch(time)) AS vib_trend_per_sec,
CASE
WHEN REGR_SLOPE(vibration_mm_s, epoch(time)) > 0.00002 THEN 'degrading'
ELSE 'stable'
END AS health
FROM {database}.{source_measurement}
WHERE time >= {start_time} AND time < {end_time}
GROUP BY 1, 2, 3The materialized health column turns a maintenance dashboard into an early-warning system. A positive, growing vib_trend_per_sec is the early shape of a failure: a signal that a device is drifting out of its normal band well before it trips a hard threshold, and the CQ flagged it without anyone writing alerting code. (When intervals are missing because a device went quiet, a generate_series left-join in the query fills the gaps so the trend line doesn't lie about continuity.)
Defense: track correlation and classification entropy
At the tactical edge, compute happens on the platform, often on a node that's disconnected for hours at a stretch. CQs fit that model precisely: they run autonomously on the node, materialize rollups locally, and those rollups survive the sync gap. See the tactical-edge reference architecture and Edge Sync over DIL links for how that two-tier picture fits together. In a multi-writer cluster the scheduler runs CQs on the Raft leader, so you don't get duplicate executions across nodes.
Schema and ingestion
Track reports from multiple sensors, tagged by sensor and platform; classification and identity as fields:
track_reports,sensor=radar_a,platform=ddg_61 track_id="T-4471",classification="surface",confidence=0.88 1719300000000000000
The typical query: track activity per window
SELECT
date_trunc('minute', epoch_us(time)) AS time,
sensor,
COUNT(*) AS reports,
COUNT(DISTINCT track_id) AS unique_tracks,
AVG(confidence) AS avg_confidence
FROM {database}.{source_measurement}
WHERE time >= {start_time} AND time < {end_time}
GROUP BY 1, 2The crazy one: entropy of the classification mix
Here's a question raw counts can't answer cheaply: is the picture normal? A sensor that usually sees a steady mix of classifications suddenly seeing all-one-type, or an unusually scattered spread, is worth a human's attention. Shannon entropy of the classification distribution captures exactly that "how concentrated vs. how spread" in a single number, and DuckDB has it built in:
SELECT
date_trunc('minute', epoch_us(time)) AS time,
sensor,
COUNT(DISTINCT track_id) AS unique_tracks,
ENTROPY(classification) AS class_entropy,
HISTOGRAM(classification) AS class_mix
FROM {database}.{source_measurement}
WHERE time >= {start_time} AND time < {end_time}
GROUP BY 1, 2Low entropy means the sensor is seeing one thing; high entropy means a varied picture. A sharp change in class_entropy from one window to the next is an anomaly signal that no threshold on raw counts would catch. Pair it with a cross-sensor CORR over the per-minute report counts: if two sensors that usually move together suddenly diverge, one may be degraded or jammed. All of it pre-computed on the node, all of it intact when the link comes back.
The pattern underneath all five
Strip away the domains and every example is the same three-step shape:
raw measurement → DuckDB computes (on a schedule) → cheap materialized table
The compute step is where the leverage is. It's not limited to averages because it's not a downsampling feature; it's a scheduled slice of a full analytical engine. OHLC, Cpk, ASOF alignment, regression slopes, entropy: if you can write it in DuckDB SQL, you can run it continuously.
And because a CQ's destination can be another CQ's source, you tier:
trades (raw ticks)
→ trades_ohlc_1m (interval 1m, retain 7d)
→ trades_ohlc_1h (interval 1h, retain 90d)
→ trades_ohlc_1d (interval 1d, retain 5y)
Each tier on its own interval and its own retention. Pair that with delete_source_after_days on the CQ and a retention policy on the raw measurement, and the lifecycle runs itself: high-resolution data lives for days, the rollups that answer 99% of queries live for years, and storage cost tracks what you actually keep.
| Tier | Resolution | Interval | Typical retention |
|---|---|---|---|
| Raw | per-event | n/a | 1–7 days |
| Tier 1 | 1m / 5m | 1m / 5m | 30–90 days |
| Tier 2 | 1h | 1h | 90–365 days |
| Tier 3 | 1d | 1d | 1–5 years |
Where to take it
A continuous query is a cron job with the full power of an analytical database behind it. The five queries above aren't edge cases; they're the everyday work of finance, manufacturing, aerospace, IoT, and defense teams, moved off the read path and onto a schedule.
Start with the one query your dashboards run most often and the most expensively. Wrap it in a CQ, point the dashboard at the destination, and watch the read path go quiet. For the mechanics (time windows, crash safety, dry runs, backfills) see the mechanics deep-dive, and the continuous queries docs for the full API.