Liftbridge Joins Basekick Labs: Building the IoT Data Platform

A few days ago, we sent Tyler Treat an email.
Tyler created Liftbridge back in 2017—a lightweight, durable message streaming system built on top of NATS. Think Kafka, but without the JVM, without ZooKeeper, and packaged in a single 16MB binary.
The project had been quiet since 2022. No releases. No commits. 2,600+ stars on GitHub, but no one at the wheel.
We asked Tyler if Basekick Labs could take over maintenance. His response:
"I would be absolutely thrilled if someone were to continue to maintain, extend, and improve it rather than have it simply die."
So here we are. Liftbridge is now part of Basekick Labs.
Thank You, Tyler
Before anything else: thank you, Tyler.
Liftbridge isn't just another streaming system. The architecture decisions—using NATS as the transport layer, separating metadata consensus (Raft) from data replication (ISR), the commit log design borrowed from Kafka—these aren't obvious choices. They come from someone who deeply understood the tradeoffs.
Tyler was the #2 contributor to NATS Streaming before building Liftbridge. He saw the limitations of storing messages redundantly in both a Raft log and a message log, and designed something better. His blog posts on distributed systems are still some of the best technical writing in the space.
Taking over a project like this is a privilege. We don't take it lightly.
Why Liftbridge
We've been building Arc—a time-series database on DuckDB and Parquet—for the past year. Arc is great at storing and querying time-series data. But storage is only part of the problem.
In IoT, data doesn't arrive neatly. Sensors send data in bursts. Networks go down. Devices reconnect and dump hours of buffered readings. You need something between your sensors and your database that can:
- Buffer data durably — If Arc is down for maintenance, messages don't disappear
- Handle backpressure — Traffic spikes don't overwhelm your database
- Enable replay — Re-process data from a specific point in time
- Decouple producers from consumers — Sensors don't need to know about Arc
This is what message streaming systems do. Kafka does it. Redpanda does it. Confluent Cloud does it.
But Kafka is heavy. It needs the JVM, ZooKeeper (or KRaft), and careful tuning. For IoT deployments—especially edge deployments—that's overkill.
Liftbridge is Kafka's semantics in a Go binary. Durable. Replicated. Lightweight.
Why Now
On December 8th, IBM announced they're acquiring Confluent for $11 billion.
We've seen this movie before.
IBM acquires a company. Prices go up. Innovation slows down. Enterprise sales cycles get longer. The product becomes part of "IBM's hybrid cloud and AI strategy" instead of being the best version of itself.
Some teams will stick with Confluent. But others—especially smaller teams, IoT deployments, edge use cases—will look for alternatives.
NATS with JetStream is one option. But JetStream is newer, and some teams want the Kafka-style log semantics that Liftbridge provides.
An abandoned project with solid architecture is now an opportunity.
How Liftbridge Fits Into Arc
Here's the architecture we're building toward:
┌─────────────────────────────────────────────────────────────────┐
│ IoT Data Platform │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌─────────────┐ ┌──────────┐ ┌───────┐ │
│ │ Sensors │───▶│ Telegraf │───▶│Liftbridge│───▶│ Arc │ │
│ │ Devices │ │ (collect) │ │ (buffer) │ │(store)│ │
│ │ Edge │ └─────────────┘ └──────────┘ └───────┘ │
│ └──────────┘ │ │ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────┐ │
│ │ Grafana │ │
│ │ (visualize/alert) │ │
│ └─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Each component has a clear job:
| Component | Role | Why this choice |
|---|---|---|
| Telegraf | Collect metrics from sensors, protocols, APIs | Industry standard, 300+ plugins, Arc already has output plugin |
| Liftbridge | Durable message buffer with replay capability | Lightweight, NATS-based, Kafka semantics without Kafka complexity |
| Arc | Time-series storage and query | DuckDB performance, Parquet portability, no vendor lock-in |
| Grafana | Visualization and alerting | Industry standard, Arc already has data source plugin |
The key insight: we don't need to build everything. Telegraf and Grafana are best-in-class and open source. We focus on the storage and streaming layers where we can differentiate.
Liftbridge Architecture
For those who want to go deeper, here's how Liftbridge works under the hood.
The Core Idea
Liftbridge adds durability to NATS. NATS is a high-performance pub/sub messaging system—fast, simple, but ephemeral. Messages that aren't consumed immediately are gone.
Liftbridge sits alongside NATS and subscribes to subjects you want to persist. It writes them to a durable, replicated commit log. Consumers can read from any point in that log—not just "what's happening now" but "everything since yesterday" or "everything since offset 1,847,293."
┌─────────────────────────────┐
│ NATS Cluster │
│ (fast, ephemeral pub/sub) │
└──────────────┬──────────────┘
│
┌──────────────▼──────────────┐
│ Liftbridge Cluster │
│ (durable, replicated log) │
│ │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │Node1│ │Node2│ │Node3│ │
│ │(L) │ │(F) │ │(F) │ │
│ └─────┘ └─────┘ └─────┘ │
└─────────────────────────────┘
L = Leader, F = Follower
Dual Consensus Model
This is where Liftbridge gets clever. It uses two different consensus mechanisms for different purposes:
Raft for Metadata
Cluster membership, stream assignments, consumer group state—this is metadata. It changes infrequently and needs strong consistency. Liftbridge uses Raft (via HashiCorp's implementation) with a single elected controller.
ISR for Data Replication
Actual message data uses Kafka's ISR (In-Sync Replicas) protocol. This is faster than Raft for high-throughput data because:
- No redundant logging (messages are only written to the commit log, not a separate Raft log)
- Leader handles writes, followers pull and acknowledge
- Flexible durability: wait for leader only, wait for all replicas, or fire-and-forget
┌─────────────────────────────────────────────────────────────────┐
│ Liftbridge Consensus Model │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────┐ ┌───────────────────────────────┐ │
│ │ Metadata Plane │ │ Data Plane │ │
│ │ (Raft Consensus) │ │ (ISR Replication) │ │
│ ├──────────────────────┤ ├───────────────────────────────┤ │
│ │ │ │ │ │
│ │ • Cluster membership │ │ • Message replication │ │
│ │ • Stream assignments │ │ • Commit log writes │ │
│ │ • Consumer groups │ │ • High watermark tracking │ │
│ │ • Leader election │ │ • Follower catch-up │ │
│ │ │ │ │ │
│ │ Single Controller │ │ Per-Partition Leader │ │
│ │ Strong consistency │ │ Tunable durability │ │
│ │ Low write volume │ │ High throughput │ │
│ │ │ │ │ │
│ └──────────────────────┘ └───────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Streams and Partitions
Liftbridge organizes data into streams. Each stream maps to a NATS subject and can have multiple partitions for parallelism.
┌─────────────────────────────────────────────────────────────────┐
│ Stream: "sensors" │
│ Subject: "sensors.*" │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Partition 0 │ │ Partition 1 │ │ Partition 2 │ │
│ │ Leader: Node1 │ │ Leader: Node2 │ │ Leader: Node3 │ │
│ ├─────────────────┤ ├─────────────────┤ ├─────────────────┤ │
│ │ Offset 0: msg │ │ Offset 0: msg │ │ Offset 0: msg │ │
│ │ Offset 1: msg │ │ Offset 1: msg │ │ Offset 1: msg │ │
│ │ Offset 2: msg │ │ Offset 2: msg │ │ Offset 2: msg │ │
│ │ ... │ │ ... │ │ ... │ │
│ │ Offset N: msg │ │ Offset N: msg │ │ Offset N: msg │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ Replication Factor: 2 │
│ Each partition replicated to 2 nodes (leader + 1 follower) │
│ │
└─────────────────────────────────────────────────────────────────┘
Commit Log Structure
Each partition is stored as a commit log—an append-only sequence of segments:
Partition Directory: /data/streams/sensors/0/
├── 00000000000000000000.log # Segment file (messages)
├── 00000000000000000000.index # Offset index (offset → position)
├── 00000000000000000000.timestamp # Time index (timestamp → offset)
├── 00000000000000001000.log # Next segment (starts at offset 1000)
├── 00000000000000001000.index
├── 00000000000000001000.timestamp
└── ...
- Segment files contain the actual messages. When a segment reaches a size limit, a new one is created.
- Index files are memory-mapped for fast lookups. Given an offset, find the file position in O(1).
- Timestamp indexes enable time-based queries: "give me everything since 2pm yesterday."
Message Format
Every message in Liftbridge has an envelope:
┌─────────────────────────────────────────────────────────────────┐
│ Message Envelope │
├─────────────────────────────────────────────────────────────────┤
│ Magic Bytes │ 4 bytes │ 0xB9 0x0E 0x43 0xB4 │
│ CRC-32C │ 4 bytes │ Checksum of payload │
│ Payload Length │ 4 bytes │ Size of protobuf message │
│ Payload │ N bytes │ Protobuf-encoded message │
└─────────────────────────────────────────────────────────────────┘
Payload (Protobuf):
├── Offset (int64) - Position in partition
├── Key (bytes) - Optional routing key
├── Value (bytes) - Actual message content
├── Timestamp (int64) - Unix nanos
├── Headers (map) - Key-value metadata
├── Subject (string) - Original NATS subject
├── ReplySubject (string) - For request-reply patterns
├── CorrelationID (string) - For tracing
└── AckInbox (string) - For publisher acks
Two Ways to Publish
Via NATS (zero code changes)
If you're already publishing to NATS, Liftbridge can subscribe and persist. Your publishers don't change at all.
Publisher → NATS subject "sensors.temperature" → Liftbridge subscribes → Durable log
Via gRPC (full control)
For durability guarantees, use the Liftbridge API directly:
Publisher → Liftbridge gRPC → Write to log → Wait for ack → Confirm
The gRPC path gives you:
- AckPolicy: LEADER (fast), ALL (durable), NONE (fire-and-forget)
- Exactly-once delivery with idempotent producers
- Transactional writes across partitions
Metadata: What the Controller Tracks
The Raft-based controller maintains cluster state:
┌─────────────────────────────────────────────────────────────────┐
│ Controller Metadata │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Cluster State │
│ ├── Brokers: [node1:9292, node2:9292, node3:9292] │
│ ├── Controller: node1 │
│ └── Controller Epoch: 7 │
│ │
│ Streams │
│ ├── sensors │
│ │ ├── Subject: sensors.* │
│ │ ├── Partitions: 3 │
│ │ ├── ReplicationFactor: 2 │
│ │ ├── Retention: 7 days │
│ │ └── Partition Assignments: │
│ │ ├── P0: Leader=node1, ISR=[node1, node2] │
│ │ ├── P1: Leader=node2, ISR=[node2, node3] │
│ │ └── P2: Leader=node3, ISR=[node3, node1] │
│ └── events │
│ └── ... │
│ │
│ Consumer Groups │
│ ├── arc-ingest │
│ │ ├── Coordinator: node2 │
│ │ ├── Members: [consumer-1, consumer-2] │
│ │ └── Offsets: │
│ │ ├── sensors/0: 1847293 │
│ │ ├── sensors/1: 1632841 │
│ │ └── sensors/2: 1923847 │
│ └── ... │
│ │
└─────────────────────────────────────────────────────────────────┘
Failure Handling
| Scenario | What Happens | Recovery Time |
|---|---|---|
| Follower crashes | ISR shrinks, writes continue | Follower rejoins, catches up from leader |
| Leader crashes | Controller elects new leader from ISR | Seconds (Raft election) |
| Controller crashes | Raft elects new controller | Seconds (Raft election) |
| Network partition | Depends on which side has quorum | Automatic when partition heals |
The key invariant: messages acknowledged with AckPolicy.ALL are never lost as long as one ISR member survives.
What's Next
Immediate (Q1 2026):
- Modernize codebase (Go 1.25+, updated dependencies)
- Security audit
- Fix CI/CD
- First release under Basekick Labs: Liftbridge 26.01.1
Near-term:
- Documentation refresh
- Integration guide: Liftbridge + Telegraf + Arc
- Docker Compose for the full stack
- Performance benchmarks
Future:
- Tiered storage (offload old segments to S3/object storage)
- Enhanced observability (Prometheus metrics, distributed tracing)
- Arc-native integration (direct ingest from Liftbridge without Telegraf hop)
The Bigger Picture
Basekick Labs is building a data platform for IoT. Not a single product—a stack.
Arc handles storage. Liftbridge handles streaming. Telegraf handles collection. Grafana handles visualization.
Each piece is open source. Each piece can be used independently. But together, they're a complete solution for industrial IoT, sensor networks, and edge computing.
We're not trying to be Confluent. We're not trying to be InfluxData. We're building something lighter, more portable, and honest about its licensing from day one.
If that resonates with you, check out https://github.com/liftbridge-io/liftbridge or https://github.com/basekick-labs/arc.
And if you're using Liftbridge in production—or want to contribute—join our Discord. The project has been quiet for two years. Let's change that.
Resources: