Skip to content

Storage: ClickHouse Schema

The events table is the primary storage for all Aires events. It uses the MergeTree engine with date-based partitioning and bloom filter indexes.

CREATE TABLE IF NOT EXISTS aires.events
(
    -- Identity
    id               String,
    timestamp        DateTime64(9, 'UTC'),
    service          LowCardinality(String),
    environment      LowCardinality(String),
    host             String,
    instance         String,

    -- Severity
    severity         LowCardinality(String),

    -- Content
    message          String,
    display_text     String,
    body             String,

    -- Tracing
    trace_id         String,
    span_id          String,
    parent_span_id   String,
    subtrace_id      String,

    -- Session / User / Agent
    session_id       String,
    user_id          String,
    agent_id         String,

    -- Source Location
    source_file      String,
    source_line      UInt32,
    source_function  String,

    -- Categorization
    category         LowCardinality(String),
    kind             LowCardinality(String),
    tags             Array(String),

    -- HTTP
    http_method      LowCardinality(String),
    http_path        String,
    http_status_code UInt16,
    http_duration_ms Int64,

    -- Metrics
    metric_name      LowCardinality(String),
    metric_value     Float64,

    -- Errors
    error_type       LowCardinality(String),
    error_message    String,
    error_stack      String,
    error_handled    Bool,

    -- SDK Metadata
    sdk_name         LowCardinality(String),
    sdk_version      LowCardinality(String),
    sdk_language     LowCardinality(String),

    -- Indexes
    INDEX idx_trace_id trace_id TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_span_id span_id TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_session_id session_id TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_message message TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toDate(timestamp)
ORDER BY (service, severity, timestamp)
TTL toDateTime(timestamp) + INTERVAL 30 DAY
SETTINGS index_granularity = 8192;
ColumnTypeDescription
idStringUnique event ID (UUID v7, time-ordered). Generated by the SDK.
timestampDateTime64(9, 'UTC')Event timestamp with nanosecond precision. Set by the SDK at event creation time.
ColumnTypeDescription
serviceLowCardinality(String)Service name (e.g. "workforce-api"). Part of the primary sort key.
environmentLowCardinality(String)Environment (e.g. "production", "staging").
hostStringHostname or pod name.
instanceStringInstance/replica identifier.
ColumnTypeDescription
severityLowCardinality(String)Log level: "trace", "debug", "info", "warn", "error", "fatal". Part of the primary sort key.
messageStringHuman-readable log message. Indexed with token bloom filter for full-text search.
display_textStringFormatted display text (may include ANSI codes).
bodyStringRaw body content (for binary payloads, stored as base64).
ColumnTypeDescription
trace_idStringDistributed trace ID. Bloom filter indexed for fast lookups.
span_idStringSpan ID within a trace. Bloom filter indexed.
parent_span_idStringParent span ID (for building span trees).
subtrace_idStringSub-trace ID for grouping nested operations.
ColumnTypeDescription
session_idStringUser/agent session ID. Bloom filter indexed.
user_idStringAuthenticated user identifier.
agent_idStringAI agent identifier.
ColumnTypeDescription
source_fileStringSource file path where the event originated.
source_lineUInt32Line number in the source file.
source_functionStringFunction or method name.
ColumnTypeDescription
categoryLowCardinality(String)Event category: "http", "db", "auth", "ai", "k8s", etc.
kindLowCardinality(String)Event kind: "log", "span", "metric", "request", "response".
tagsArray(String)Free-form tags. Queryable with has(tags, 'value').
ColumnTypeDescription
http_methodLowCardinality(String)HTTP method ("GET", "POST", etc.).
http_pathStringURL path (e.g. "/api/tasks").
http_status_codeUInt16HTTP response status code.
http_duration_msInt64Request duration in milliseconds.
ColumnTypeDescription
metric_nameLowCardinality(String)Metric name (e.g. "http.request.duration_ms").
metric_valueFloat64Metric value.
ColumnTypeDescription
error_typeLowCardinality(String)Error class name (e.g. "DatabaseError", "TimeoutError").
error_messageStringError message text.
error_stackStringStack trace.
error_handledBoolWhether the error was caught (true) or unhandled (false).
ColumnTypeDescription
sdk_nameLowCardinality(String)SDK identifier ("aires-sdk-rust", "aires-sdk-ts", "aires-sdk-python").
sdk_versionLowCardinality(String)SDK version string.
sdk_languageLowCardinality(String)SDK language ("rust", "typescript", "python").
ORDER BY (service, severity, timestamp)

The primary sort key optimizes the most common query pattern: filter by service, then severity, then time range. ClickHouse can skip entire granules (8192 rows) when these columns appear in WHERE clauses.

Queries that benefit from the primary key:

-- Exact prefix match on service + severity + time (fastest)
WHERE service = 'my-api' AND severity = 'error' AND timestamp > now() - INTERVAL 1 HOUR

-- Prefix match on service + time
WHERE service = 'my-api' AND timestamp > now() - INTERVAL 1 HOUR

-- Prefix match on service only
WHERE service = 'my-api'
INDEX idx_trace_id trace_id TYPE bloom_filter(0.01) GRANULARITY 1
INDEX idx_span_id span_id TYPE bloom_filter(0.01) GRANULARITY 1
INDEX idx_session_id session_id TYPE bloom_filter(0.01) GRANULARITY 1

Bloom filters enable fast point lookups on high-cardinality string columns. The 0.01 false positive rate means 1% of granules may be read unnecessarily — a good tradeoff for columns with millions of unique values.

Queries that benefit from bloom filters:

WHERE trace_id = 'abc-123'
WHERE session_id = 'sess-789'
WHERE span_id = 'span-001'
INDEX idx_message message TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 1

The token bloom filter indexes individual tokens (words) in the message column. This enables substring-like search:

-- Token search (uses the index)
WHERE message LIKE '%timeout%'
WHERE hasToken(message, 'timeout')
PARTITION BY toDate(timestamp)

One partition per calendar day. Benefits:

  • Time-range queries skip entire partitions for dates outside the range
  • TTL cleanup drops entire partitions efficiently (no row-by-row deletion)
  • Backup/restore can operate on individual day partitions

Columns marked LowCardinality(String) use dictionary encoding. ClickHouse maintains a dictionary of unique values and stores integer indexes instead of the full string. This dramatically reduces storage for columns with fewer than ~10,000 unique values:

  • service — typically 10-100 unique services
  • environment — 3-5 values
  • severity — 6 values
  • category — 10-20 values
  • kind — 5-10 values
  • http_method — 9 values (GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, TRACE, CONNECT)
  • error_type — typically 20-100 unique error classes
  • sdk_name, sdk_version, sdk_language — handful of values

Typical compression ratios for observability data in ClickHouse:

ColumnRaw SizeCompressedRatio
timestamp8 bytes/row~0.5 bytes/row16x
service (LowCardinality)~20 bytes/row~0.1 bytes/row200x
severity (LowCardinality)~5 bytes/row~0.05 bytes/row100x
message~100 bytes/row~30 bytes/row3x
trace_id36 bytes/row~5 bytes/row7x
Overall~500 bytes/row~50 bytes/row10x

At 10x compression, 1 billion events (~500 GB raw) fits in ~50 GB of disk.