Unlocking Observability by Design With Inferred Schemas

Most modern systems emit large volumes of telemetry in the form of logs, metrics, traces, and events. Instrumentation is relatively easy to add, and observability platforms are designed to ingest flexible, semi-structured data. In many cases, teams rely on dynamic mapping or schema-on-read approaches so that new fields can be indexed automatically without coordination.

This flexibility lowers the barrier to entry. A developer can add a new attribute to a log or span and see it appear in their observability backend without updating a central schema definition. For fast-moving teams, this is convenient. However, as systems grow, the lack of structure begins to create friction.

Schema Drift in Practice

In distributed systems, multiple teams often instrument services independently. Over time, small inconsistencies accumulate. A field representing a user identifier might be named user_id in one service, uid in another, and customerId in a third. Status codes might be logged as integers in some contexts and strings in others. Optional attributes may be present only in certain deployments.

Dynamic mapping systems do not prevent this. They accept new fields and attempt to reconcile type differences where possible. When reconciliation fails, the result may be index conflicts or dropped fields. When reconciliation succeeds, it can mask underlying inconsistency. The system continues to function, but predictability decreases.

Field Growth and Operational Impact

Another common outcome is field growth. As developers add attributes for debugging or experimentation, those attributes are often left in place. Over time, the number of distinct fields across logs and spans increases significantly.

This can affect storage efficiency, index performance, and query planning. More importantly, it makes the data model harder to reason about. When hundreds or thousands of distinct attributes exist, it becomes difficult to distinguish between core, stable fields and incidental ones.

From an operational perspective, this affects how teams build tooling on top of telemetry. Automated dashboards, alert templates, and derived metrics all assume some level of structural consistency. Without that consistency, each integration must account for variation.

The core issue is not that telemetry is dynamic. The issue is that the system has no explicit contract describing what telemetry is supposed to look like.

Observability as an Afterthought

In API development, schema definitions are standard practice. OpenAPI specifications define request and response structures. Changes are versioned and reviewed. Tooling can be generated from the contract, and validation occurs at runtime or in CI pipelines.

Telemetry is often handled differently. Instrumentation decisions are made locally, and structural changes are rarely treated as contract changes. There is typically no automated validation to detect when an attribute disappears or changes type.

As a result, telemetry evolves organically. This is manageable in small systems, but becomes increasingly difficult as the number of services and teams grows.

The OpenTelemetry community has started addressing this gap by introducing tools that support schema design and governance.

Observability by Design with OpenTelemetry Weaver

OpenTelemetry Weaver, released in 2025, provides a way to define telemetry schemas explicitly. A schema can describe expected attributes for logs, spans, and events. From that definition, developers can generate SDKs to standardize instrumentation, compare schema versions for compliance, and validate telemetry payloads against the defined structure.

This approach aligns telemetry with contract-driven development. Instead of allowing structure to emerge implicitly, teams can design it deliberately. For new systems, starting with a defined schema is straightforward. Instrumentation libraries can be aligned early, and compliance can be enforced from the beginning. For existing systems, the situation is more complex.

The Cost of Retrofitting Structure

Most organizations already operate large OpenTelemetry pipelines. Collectors aggregate traffic from many services. Dashboards and alerts depend on existing field names. Replacing or renaming attributes across dozens of services is not trivial.

A full migration to a schema-first approach may require:

Updating instrumentation libraries in multiple codebases
Refactoring field names and attribute types
Coordinating releases across teams
Ensuring compatibility with existing queries and alerts

Even if the long-term benefits are clear, the short-term effort can delay adoption. Teams may hesitate to introduce structural enforcement if it risks breaking operational workflows.

This creates a gap between the ideal state, which is explicit and validated telemetry contracts, and the current reality of heterogeneous data.

Inferring Structure from Live Telemetry

One way to reduce this gap is to start from existing telemetry rather than replacing it immediately.

OpenTelemetry collectors already process all logs, metrics, and traces flowing through a system. By analyzing these payloads, it is possible to infer a schema based on observed attributes and their types. Such a process can identify:

Which fields are consistently present
Where type conflicts occur
How attributes differ across services
When new fields are introduced

An inferred schema registry can then serve as a representation of the current state of the system. Instead of designing a schema in isolation, teams can derive it from live traffic.

This inferred schema can be reviewed, refined, and gradually formalized. Teams can begin tracking deviations from the observed structure, flag unexpected changes, and introduce validation incrementally.

Rather than requiring an immediate re-instrumentation effort, this approach allows structure to emerge in a controlled way.

Toward Predictable Telemetry

The underlying problem is not the volume of telemetry or the use of dynamic systems. It is the absence of an explicit, shared understanding of structure.

When telemetry is treated as a contract, whether designed up front or inferred from live systems, it becomes possible to validate, version, and govern it. Without that contract, systems tend to accumulate inconsistency over time.

Tools like OpenTelemetry Weaver introduce schema design into the ecosystem. Inferring schemas from collectors offers a practical way to apply those ideas to existing environments.

The combination addresses a common challenge. Teams need to move from heterogeneous, organically grown telemetry to a predictable and governed model without requiring disruptive change all at once.