Reference: Severity Levels
Severity Levels
Section titled “Severity Levels”Aires defines six severity levels, aligned with common logging conventions and the OpenTelemetry severity number mapping.
| Level | Proto Enum | Proto Value | ClickHouse String | OTel Range |
|---|---|---|---|---|
| Trace | TRACE | 1 | "trace" | 1-4 |
| Debug | DEBUG | 2 | "debug" | 5-8 |
| Info | INFO | 3 | "info" | 9-12 |
| Warn | WARN | 4 | "warn" | 13-16 |
| Error | ERROR | 5 | "error" | 17-20 |
| Fatal | FATAL | 6 | "fatal" | 21-24 |
There is also SEVERITY_UNSPECIFIED = 0, stored as "unspecified" in ClickHouse.
When to Use Each Level
Section titled “When to Use Each Level”TRACE (1)
Section titled “TRACE (1)”The finest-grained diagnostic information. Use for detailed internal state that is only useful when actively debugging a specific problem.
When to use:
- Function entry/exit in hot paths
- Cache hit/miss details
- Internal state machine transitions
- Loop iteration details
Retention recommendation: 1-7 days. Trace events are high-volume and rarely useful after the debugging session ends.
DEBUG (2)
Section titled “DEBUG (2)”Development-time diagnostics. More useful than trace, but still too verbose for production monitoring.
When to use:
- Configuration loading results
- Connection establishment details
- Query plans or optimization decisions
- Feature flag evaluation results
Retention recommendation: 7-14 days.
INFO (3)
Section titled “INFO (3)”Normal operational events. The default level for production. Use for events that an operator would want to see in a dashboard under normal conditions.
When to use:
- Server startup and shutdown
- Request/response logging (HTTP, gRPC)
- Successful completions of important operations
- Deployment and scaling events
- User authentication events
Retention recommendation: 30 days.
WARN (4)
Section titled “WARN (4)”Potential problems that may or may not require action. The system is still operating correctly, but something unexpected happened.
When to use:
- Resource utilization approaching limits
- Retry attempts (but not final failures)
- Deprecated feature usage
- Configuration values that may cause issues
- Clock skew or timing anomalies
Retention recommendation: 30-90 days.
ERROR (5)
Section titled “ERROR (5)”Errors that need investigation and potentially immediate action. The operation failed, but the system is still running.
When to use:
- Caught exceptions that affect user-facing functionality
- Failed external API calls (after all retries exhausted)
- Data validation failures
- Authentication/authorization failures
- Business logic errors
Retention recommendation: 90 days.
FATAL (6)
Section titled “FATAL (6)”Unrecoverable errors. The system cannot continue operating and is shutting down (or a critical subsystem has failed).
When to use:
- Unrecoverable database connection failures
- Out of memory conditions
- Corrupted state that prevents operation
- Critical dependency unavailable
- Uncaught exceptions that crash the process
Retention recommendation: 90+ days. Fatal events are rare and always worth investigating.
Protobuf Definition
Section titled “Protobuf Definition”SDK Mapping
Section titled “SDK Mapping”TypeScript
Section titled “TypeScript”Python
Section titled “Python”Collector Transformation
Section titled “Collector Transformation”The collector converts the proto enum integer to a string for ClickHouse storage:
Strings are used in ClickHouse (rather than integers) because:
- They’re human-readable in query results
LowCardinality(String)with only 7 values is as efficient as an enum- They’re compatible with OpenTelemetry severity text