The events table is the primary storage for all Aires events. It uses the MergeTree engine with date-based partitioning and bloom filter indexes.
CREATE TABLE IF NOT EXISTS aires.events
(
-- Identity
id String,
timestamp DateTime64(9, 'UTC'),
service LowCardinality(String),
environment LowCardinality(String),
host String,
instance String,
-- Severity
severity LowCardinality(String),
-- Content
message String,
display_text String,
body String,
-- Tracing
trace_id String,
span_id String,
parent_span_id String,
subtrace_id String,
-- Session / User / Agent
session_id String,
user_id String,
agent_id String,
-- Source Location
source_file String,
source_line UInt32,
source_function String,
-- Categorization
category LowCardinality(String),
kind LowCardinality(String),
tags Array(String),
-- HTTP
http_method LowCardinality(String),
http_path String,
http_status_code UInt16,
http_duration_ms Int64,
-- Metrics
metric_name LowCardinality(String),
metric_value Float64,
-- Errors
error_type LowCardinality(String),
error_message String,
error_stack String,
error_handled Bool,
-- SDK Metadata
sdk_name LowCardinality(String),
sdk_version LowCardinality(String),
sdk_language LowCardinality(String),
-- Indexes
INDEX idx_trace_id trace_id TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_span_id span_id TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_session_id session_id TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_message message TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toDate(timestamp)
ORDER BY (service, severity, timestamp)
TTL toDateTime(timestamp) + INTERVAL 30 DAY
SETTINGS index_granularity = 8192;
| Column | Type | Description |
|---|
id | String | Unique event ID (UUID v7, time-ordered). Generated by the SDK. |
timestamp | DateTime64(9, 'UTC') | Event timestamp with nanosecond precision. Set by the SDK at event creation time. |
| Column | Type | Description |
|---|
service | LowCardinality(String) | Service name (e.g. "workforce-api"). Part of the primary sort key. |
environment | LowCardinality(String) | Environment (e.g. "production", "staging"). |
host | String | Hostname or pod name. |
instance | String | Instance/replica identifier. |
| Column | Type | Description |
|---|
severity | LowCardinality(String) | Log level: "trace", "debug", "info", "warn", "error", "fatal". Part of the primary sort key. |
message | String | Human-readable log message. Indexed with token bloom filter for full-text search. |
display_text | String | Formatted display text (may include ANSI codes). |
body | String | Raw body content (for binary payloads, stored as base64). |
| Column | Type | Description |
|---|
trace_id | String | Distributed trace ID. Bloom filter indexed for fast lookups. |
span_id | String | Span ID within a trace. Bloom filter indexed. |
parent_span_id | String | Parent span ID (for building span trees). |
subtrace_id | String | Sub-trace ID for grouping nested operations. |
| Column | Type | Description |
|---|
session_id | String | User/agent session ID. Bloom filter indexed. |
user_id | String | Authenticated user identifier. |
agent_id | String | AI agent identifier. |
| Column | Type | Description |
|---|
source_file | String | Source file path where the event originated. |
source_line | UInt32 | Line number in the source file. |
source_function | String | Function or method name. |
| Column | Type | Description |
|---|
category | LowCardinality(String) | Event category: "http", "db", "auth", "ai", "k8s", etc. |
kind | LowCardinality(String) | Event kind: "log", "span", "metric", "request", "response". |
tags | Array(String) | Free-form tags. Queryable with has(tags, 'value'). |
| Column | Type | Description |
|---|
http_method | LowCardinality(String) | HTTP method ("GET", "POST", etc.). |
http_path | String | URL path (e.g. "/api/tasks"). |
http_status_code | UInt16 | HTTP response status code. |
http_duration_ms | Int64 | Request duration in milliseconds. |
| Column | Type | Description |
|---|
metric_name | LowCardinality(String) | Metric name (e.g. "http.request.duration_ms"). |
metric_value | Float64 | Metric value. |
| Column | Type | Description |
|---|
error_type | LowCardinality(String) | Error class name (e.g. "DatabaseError", "TimeoutError"). |
error_message | String | Error message text. |
error_stack | String | Stack trace. |
error_handled | Bool | Whether the error was caught (true) or unhandled (false). |
| Column | Type | Description |
|---|
sdk_name | LowCardinality(String) | SDK identifier ("aires-sdk-rust", "aires-sdk-ts", "aires-sdk-python"). |
sdk_version | LowCardinality(String) | SDK version string. |
sdk_language | LowCardinality(String) | SDK language ("rust", "typescript", "python"). |
ORDER BY (service, severity, timestamp)
The primary sort key optimizes the most common query pattern: filter by service, then severity, then time range. ClickHouse can skip entire granules (8192 rows) when these columns appear in WHERE clauses.
Queries that benefit from the primary key:
-- Exact prefix match on service + severity + time (fastest)
WHERE service = 'my-api' AND severity = 'error' AND timestamp > now() - INTERVAL 1 HOUR
-- Prefix match on service + time
WHERE service = 'my-api' AND timestamp > now() - INTERVAL 1 HOUR
-- Prefix match on service only
WHERE service = 'my-api'
INDEX idx_trace_id trace_id TYPE bloom_filter(0.01) GRANULARITY 1
INDEX idx_span_id span_id TYPE bloom_filter(0.01) GRANULARITY 1
INDEX idx_session_id session_id TYPE bloom_filter(0.01) GRANULARITY 1
Bloom filters enable fast point lookups on high-cardinality string columns. The 0.01 false positive rate means 1% of granules may be read unnecessarily — a good tradeoff for columns with millions of unique values.
Queries that benefit from bloom filters:
WHERE trace_id = 'abc-123'
WHERE session_id = 'sess-789'
WHERE span_id = 'span-001'
INDEX idx_message message TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 1
The token bloom filter indexes individual tokens (words) in the message column. This enables substring-like search:
-- Token search (uses the index)
WHERE message LIKE '%timeout%'
WHERE hasToken(message, 'timeout')
PARTITION BY toDate(timestamp)
One partition per calendar day. Benefits:
- Time-range queries skip entire partitions for dates outside the range
- TTL cleanup drops entire partitions efficiently (no row-by-row deletion)
- Backup/restore can operate on individual day partitions
Columns marked LowCardinality(String) use dictionary encoding. ClickHouse maintains a dictionary of unique values and stores integer indexes instead of the full string. This dramatically reduces storage for columns with fewer than ~10,000 unique values:
service — typically 10-100 unique services
environment — 3-5 values
severity — 6 values
category — 10-20 values
kind — 5-10 values
http_method — 9 values (GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, TRACE, CONNECT)
error_type — typically 20-100 unique error classes
sdk_name, sdk_version, sdk_language — handful of values
Typical compression ratios for observability data in ClickHouse:
| Column | Raw Size | Compressed | Ratio |
|---|
timestamp | 8 bytes/row | ~0.5 bytes/row | 16x |
service (LowCardinality) | ~20 bytes/row | ~0.1 bytes/row | 200x |
severity (LowCardinality) | ~5 bytes/row | ~0.05 bytes/row | 100x |
message | ~100 bytes/row | ~30 bytes/row | 3x |
trace_id | 36 bytes/row | ~5 bytes/row | 7x |
| Overall | ~500 bytes/row | ~50 bytes/row | 10x |
At 10x compression, 1 billion events (~500 GB raw) fits in ~50 GB of disk.