Building a Real-Time Global Vessel Tracking System with Arc

Imagen de portada para Building a Real-Time Global Vessel Tracking System with Arc

Photo by Athanasios Papazacharias on Unsplash

I like tracking things... airplanes, cars, devices, and even Santa. Today, we're going to learn how to track vessels using Python to collect and process the information, push it into Arc (our high-performance time-series database), and create a dashboard to visualize the collected data using Grafana.

The Challenge

Vessel tracking generates massive amounts of time-series data - position updates, speed changes, heading adjustments - all streaming in real-time from thousands of ships worldwide. This is exactly the type of high-cardinality, high-throughput workload that Arc was built for.

We'll be using AISStream.io, which provides live vessel data via WebSocket. This data includes position, speed, heading, and navigational status - perfect for demonstrating Arc's capabilities with Industrial IoT workloads.

Tracking Choice

I chose to track vessels in two distinct locations: the Port of Miami, Florida, and the Port of San Francisco, California. These ports were selected due to their high activity levels - Miami is one of the world's busiest cruise ports and a major cargo hub, while San Francisco Bay handles massive container ship traffic.

Setting Up the Infrastructure

We're going to use Arc as our time-series database. If you're not familiar with Arc, it's a high-performance time-series database built for billion-record Industrial IoT workloads. It delivers 4.21M records/sec sustained throughput and stores data in portable Parquet files you own.

For visualization, we'll use Grafana with the Arc datasource plugin, which uses Apache Arrow for high-performance data transfer.

Let's run everything on localhost using Docker. My docker-compose.yml file looks like this:

version: '3.8'
 
services:
  arc:
    image: ghcr.io/basekick-labs/arc:25.11.1
    container_name: arc
    ports:
      - "8000:8000"  # Arc HTTP API
    volumes:
      - arc-data:/app/data
    environment:
      - STORAGE_BACKEND=local
    restart: unless-stopped
 
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=basekick-arc-datasource
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - arc
    restart: unless-stopped
 
volumes:
  arc-data:
  grafana-data:

A few important things about this setup:

  • Data persistence: Both Arc and Grafana persist data in Docker volumes
  • Grafana: We'll install the Arc datasource plugin after Grafana starts

For production deployments, use Docker secrets or environment files instead of hardcoded credentials.

Once you've customized the file, start the services:

docker compose up -d

Get Your Arc Admin Token

On first startup, Arc generates an admin API token. You need to capture this from the logs:

docker logs arc

Look for the token in the output:

======================================================================
FIRST RUN - INITIAL ADMIN TOKEN GENERATED
======================================================================
Initial admin API token: ...............................QfT5rVhLCewKA
======================================================================
SAVE THIS TOKEN! It will not be shown again.
Use this token to login to the web UI or API.
You can create additional tokens after logging in.
======================================================================

Save this token - you'll need it for API calls and Grafana configuration!

Verify the containers are running:

docker ps

You should see:

CONTAINER ID   IMAGE                                  COMMAND            CREATED          STATUS          PORTS                    NAMES
b25f21e653f4   grafana/grafana:latest                 "/run.sh"          20 minutes ago   Up 20 minutes   0.0.0.0:3000->3000/tcp   grafana
f4f5ae585632   ghcr.io/basekick-labs/arc:25.11.1     "/app/arc"         20 minutes ago   Up 20 minutes   0.0.0.0:8000->8000/tcp   arc

Shaping and Pushing Data

Now for the fun part - collecting vessel data and streaming it to Arc. We'll use Python with Arc's HTTP API.

Note: Arc creates databases automatically when you first write data to them. Databases in Arc are namespaces that organize your tables - no need to create them explicitly!

First, install the required dependencies:

pip3 install websockets requests msgpack

Here's the Python code to stream AIS data into Arc using MessagePack columnar format:

import asyncio
import websockets
import json
from datetime import datetime, timezone
import requests
import msgpack
import os
 
# Arc configuration
ARC_URL = "http://localhost:8000"
ARC_TOKEN = os.environ.get("ARC_TOKEN", "your-secure-token-here")
AIS_API_KEY = os.environ.get("AISAPIKEY")
 
def send_to_arc(batch_data):
    """Send batch of data points to Arc using MessagePack columnar format"""
    # Transform row data into columnar format
    if not batch_data:
        return True
 
    # Columnar format - arrange data by columns for optimal performance
    data = {
        "m": "ais_data",  # measurement/table name
        "columns": {
            "time": [int(d["timestamp"].timestamp() * 1000) for d in batch_data],
            "ship_id": [d["ship_id"] for d in batch_data],
            "latitude": [d["latitude"] for d in batch_data],
            "longitude": [d["longitude"] for d in batch_data],
            "speed": [d["speed"] for d in batch_data],
            "heading": [d["heading"] for d in batch_data],
            "nav_status": [d["nav_status"] for d in batch_data]
        }
    }
 
    try:
        response = requests.post(
            f"{ARC_URL}/api/v1/write/msgpack",
            headers={
                "Authorization": f"Bearer {ARC_TOKEN}",
                "Content-Type": "application/msgpack",
                "x-arc-database": "vessels_tracking"  # Specify database via header
            },
            data=msgpack.packb(data)
        )
 
        if response.status_code == 204:
            return True
        else:
            print(f"Error {response.status_code}: {response.text}")
            return False
    except requests.exceptions.RequestException as e:
        print(f"Error writing to Arc: {e}")
        return False
 
async def connect_ais_stream():
    """Connect to AIS stream and push data to Arc"""
 
    async with websockets.connect("wss://stream.aisstream.io/v0/stream") as websocket:
        subscribe_message = {
            "APIKey": AIS_API_KEY,
            "BoundingBoxes": [
                # Miami, Florida
                [[25.645, -80.345], [25.905, -80.025]],
                # San Francisco Bay, California
                [[37.45, -122.55], [37.95, -122.25]],
            ],
            "FilterMessageTypes": ["PositionReport"],
        }
 
        subscribe_message_json = json.dumps(subscribe_message)
        await websocket.send(subscribe_message_json)
 
        batch = []
        batch_size = 100  # Arc handles batches efficiently
 
        async for message_json in websocket:
            message = json.loads(message_json)
            message_type = message["MessageType"]
 
            if message_type == "PositionReport":
                ais_message = message["Message"]["PositionReport"]
 
                # Prepare data point for Arc
                data_point = {
                    "timestamp": datetime.now(timezone.utc),
                    "ship_id": ais_message['UserID'],
                    "latitude": ais_message['Latitude'],
                    "longitude": ais_message['Longitude'],
                    "speed": ais_message['Sog'],
                    "heading": ais_message['Cog'],
                    "nav_status": str(ais_message['NavigationalStatus'])
                }
 
                batch.append(data_point)
 
                print(f"[{data_point['timestamp'].isoformat()}] ShipId: {data_point['ship_id']} "
                      f"Lat: {data_point['latitude']:.6f} Lon: {data_point['longitude']:.6f} "
                      f"Speed: {data_point['speed']} Heading: {data_point['heading']}")
 
                # Send batch when it reaches batch_size
                if len(batch) >= batch_size:
                    if send_to_arc(batch):
                        print(f"✓ Sent {len(batch)} records to Arc")
                    batch = []
 
if __name__ == "__main__":
    if not AIS_API_KEY:
        print("Error: AISAPIKEY environment variable not set")
        print("Sign up at https://aisstream.io to get your API key")
        exit(1)
 
    asyncio.run(connect_ais_stream())

Key Points About This Code

  1. MessagePack Columnar Format: Uses Arc's high-performance columnar protocol - data organized by columns instead of rows for optimal compression and ingestion speed
  2. Batching: Collects 100 data points before sending, then transforms to columnar format (Arc handles batches efficiently)
  3. Database Specification: Database is specified via x-arc-database header, measurement name (ais_data) goes in the m field
  4. Timestamp Conversion: Converts Python datetime to millisecond Unix timestamp (Arc's native time format)
  5. AIS API Key: Sign up at aisstream.io using your GitHub account
  6. Bounding Boxes: Customize coordinates for your tracking area

Save this as vessel_tracker.py and run it:

export AISAPIKEY="your-ais-api-key"
export ARC_TOKEN="your-secure-token-here"
python3 vessel_tracker.py

If everything works, you'll see output like this:

[2025-11-19T13:03:59.876760+00:00] ShipId: 368341690 Lat: 37.794628 Lon: -122.318050 Speed: 9.1 Heading: 107.5
[2025-11-19T13:04:01.503668+00:00] ShipId: 368231420 Lat: 37.512495 Lon: -122.195927 Speed: 0 Heading: 360
[2025-11-19T13:04:02.522329+00:00] ShipId: 366999711 Lat: 37.810505 Lon: -122.360678 Speed: 0 Heading: 289
✓ Sent 100 records to Arc

Verifying Data in Arc

Let's confirm the data is in Arc using SQL:

curl -X POST http://localhost:8000/api/v1/query \
  -H "Authorization: Bearer $ARC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT * FROM vessels_tracking.ais_data ORDER BY time DESC LIMIT 10"
  }'

You should see your vessel data:

{
  "columns": ["timestamp", "ship_id", "latitude", "longitude", "speed", "heading", "nav_status"],
  "data": [
    ["2025-11-19T13:03:59.876Z", 368341690, 37.79463, -122.31805, 9.1, 107.5, "0"],
    ["2025-11-19T13:04:01.503Z", 368231420, 37.512493, -122.19593, 0, 360, "5"],
    ...
  ]
}

Success! Arc is ingesting and storing your vessel tracking data.

Visualizing with Grafana

Now let's create a real-time dashboard. Go to http://localhost:3000 (username: admin, password: admin).

Install Arc Datasource Plugin

Arc has a native Grafana datasource that uses Apache Arrow for high-performance data transfer. You can either download the latest release or build from source.

Option 1: Download Release (Recommended)

# Download latest release
wget https://github.com/basekick-labs/grafana-arc-datasource/releases/latest/download/basekick-arc-datasource-1.0.0.zip
 
# Extract and copy to Grafana container
unzip basekick-arc-datasource-1.0.0.zip
docker cp basekick-arc-datasource grafana:/var/lib/grafana/plugins/
 
# Restart Grafana
docker restart grafana

Option 2: Build from Source

# Clone the repository
git clone https://github.com/basekick-labs/grafana-arc-datasource
cd grafana-arc-datasource
 
# Install dependencies and build
npm install
npm run build
 
# Copy to Grafana container
docker cp dist grafana:/var/lib/grafana/plugins/basekick-arc-datasource
 
# Restart Grafana
docker restart grafana

Wait a minute for Grafana to restart, then verify the plugin is loaded by checking the Grafana logs:

docker logs grafana | grep -i arc

Add Arc Data Source

Now configure the Arc datasource:

  1. Go to ConfigurationData sources
  2. Click Add data source
  3. Search for Arc and select it
  4. Configure the connection:
    • URL: http://arc:8000 (use the container name since we're in Docker)
    • API Key: Your Arc authentication token (from the logs earlier)
    • Database: vessels_tracking
  5. Click Save & Test

You should see "Data source is working" ✓

Create Your First Visualization

  1. Click Explore view
  2. Enter this SQL query:
SELECT
    time,
    ship_id,
    latitude,
    longitude,
    speed,
    heading,
    nav_status
FROM vessels_tracking.ais_data
WHERE $__timeFilter(time)
ORDER BY time DESC
LIMIT 1000

Note: The $__timeFilter() macro automatically adds the time range from Grafana's time picker.

  1. Click Run query

You'll see a table view of your data. Now let's make it visual:

Create a Geomap

  1. Click Add to dashboardOpen Dashboard
  2. Click Edit on the panel
  3. In the Visualization dropdown, select Geomap
  4. Important: In the query editor, change Format from "Time series" to "Table"
    • This tells Grafana to treat each row as a single data point rather than creating multiple time series
    • Without this, you'll see cluttered tooltips with many series

You'll now see vessel positions plotted on a map! Zoom into San Francisco Bay or Miami to see the details:

Vessel tracking map showing ships in San Francisco Bay

Click on any red dot to see detailed vessel information including speed, heading, and navigational status.

Why Arc for Vessel Tracking?

This use case demonstrates several of Arc's strengths:

  • High throughput: Arc handles 4.21M records/sec sustained - perfect for thousands of vessels reporting positions
  • Time-series optimized: Automatic time-based partitioning for fast queries
  • DuckDB SQL: Full analytical SQL support (window functions, CTEs, geo queries)
  • Portable storage: Data stored in Parquet files you can query with any tool
  • Low resource usage: Runs efficiently on modest hardware

Arc's Performance at Scale

This vessel tracking workload demonstrates Arc's real-world Industrial IoT capabilities. With approximately 100,000 vessels globally reporting position updates every 10 seconds, that's over 10,000 updates per second - exactly the type of high-cardinality, high-throughput scenario Arc was designed for.

Arc handles this workload effortlessly on a single node:

  • Sustained ingestion: 4.21M records/sec capacity means plenty of headroom for growth
  • Real-time queries: Sub-second response times even as your dataset grows to billions of records
  • Efficient storage: Parquet compression reduces storage costs by 3-5x while maintaining query performance
  • Scalable architecture: Add more vessels, ports, or data points without infrastructure changes

The same architecture scales from tracking a few dozen vessels in a single port to monitoring global maritime traffic across all major shipping lanes.

To Conclude

What a fun project! In this article, you learned how to:

  • Track vessels using the AisStream API
  • Stream real-time data with Python WebSockets
  • Deploy Arc and Grafana with Docker
  • Ingest time-series data into Arc
  • Visualize vessel movements with Grafana's Geomap

The same patterns apply to any Industrial IoT use case: fleet tracking, equipment telemetry, smart city sensors, or medical device monitoring.

Have you tried it? Let me know on Twitter or LinkedIn about your results.

Want to learn more about Arc? Check out the documentation or join our Discord.

Ready to handle billion-record sensor workloads?