OpenTelemetry in 2024: A Practical Getting-Started Guide
The observability landscape has settled. Here is how to implement OpenTelemetry correctly, avoid the common traps, and decide when you need expert help.
What is OpenTelemetry and why it won the observability standards war
For years, the observability space was fragmented. You had Jaeger for tracing, Prometheus for metrics, and Fluentd for logs. Each vendor had their own SDK, their own proprietary format, and their own ecosystem lock-in. It was a nightmare for developers trying to move between tools.
Enter OpenTelemetry (OTel). Born as a joint project between Sysdig and Lightstep, and now a top-level CNCF project, OTel unified these three pillars under a single specification. It provides a vendor-neutral way to generate, collect, and analyze telemetry data.
By 2024, the "standards war" is effectively over. The major observability platforms (Datadog, Honeycomb, New Relic, Splunk) all support OTel natively. If you aren't using OTel today, you are building technical debt that will cost you dearly when you eventually need to switch providers.
The Core Philosophy
OTel isn't just a library; it's a set of specifications. The goal is to standardize the data, not the tooling. This means you can instrument your application once using the OTel SDK, and then route that data to any backend you choose without rewriting your code.
How the pieces fit together
Understanding the OTel architecture is crucial before you start coding. It consists of three main components working in a pipeline:
- The SDK: The client-side library installed in your application code. It handles instrumentation (adding telemetry to your code) and the initial processing of data.
- The Collector: A standalone, vendor-neutral service that receives telemetry data from the SDK. It acts as a "brain," performing transformations, filtering, and enrichment before sending data to the backend.
- Exporters: The interface that sends the processed data to the final destination (e.g., a backend API, a file, or another system).
Instrumenting a Node.js service in under 30 minutes
Let's get our hands dirty. We'll instrument a simple Express API using the OTel Node.js SDK. We'll use the OTel Collector running locally to visualize the traces.
1. Initialize the project
npm init -y
npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-http
2. Create instrumentation.js
const { NodeTracerProvider } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const provider = new NodeTracerProvider({
resource: Resource.default().merge(new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-express-service'
}))
});
provider.register();
const exporter = new OTLPTraceExporter({
url: 'http://localhost:4318/v1/traces'
});
provider.getTracer('default').startSpan('hello-world').end();
3. Run it
Start your OTel Collector locally, then run your app with the instrumentation enabled. You should see traces flowing into your backend within seconds.
Common pitfalls and how to avoid them
Sampling is not optional
Sending 100% of traces to your backend will kill your ingestion costs and slow down your application. Use the OTel Sampling API to send a representative subset of traffic (e.g., 10-20%) for debugging, and 0% for production monitoring.
Baggage limits
Baggage is a key-value store for context propagation. It has a default limit of 8KB per span. If you try to store large JSON blobs in baggage, you will get errors. Keep baggage small and structured.
Context Propagation
Forgetting to attach the context to async operations (like database queries or message queue publishes) is the #1 reason traces appear "broken" or disconnected. Always use the OTel context manager.
OTel vs Proprietary Agents
OpenTelemetry
Pros: Vendor-neutral, future-proof, open source, highly extensible, standard for the industry.
Cons: Requires more setup (Collector), steeper learning curve, less "out of the box" magic.
Proprietary Agents
Pros: One-click install, pre-configured for the vendor's backend, often includes auto-instrumentation for specific frameworks.
Cons: Vendor lock-in, expensive per-host pricing, less control over data flow.
When to call in a consultant vs go it alone
Implementing OTel is a great project for a small team to tackle. However, observability is a marathon, not a sprint. You need to define what "good" looks like before you start collecting data.
Call in a consultant like LogFlow when:
- You have a complex, multi-cloud architecture with 50+ services.
- You need to correlate logs, traces, and metrics across different vendors.
- You want to build a "Golden Signals" monitoring strategy but don't know where to start.
- You need to train your team on on-call best practices and incident response.
Go it alone if:
- You are a small startup (under 10 engineers) with a single cloud provider.
- You have a developer who loves tinkering with infrastructure.
- Your primary goal is simply to stop your app from crashing and you don't care about deep insights.
Sarah Jenkins
Sarah is a Senior Observability Engineer at LogFlow with over a decade of experience in distributed systems. She has helped Fortune 500 companies migrate from legacy monitoring stacks to modern OpenTelemetry pipelines. When she's not debugging traces, she's contributing to the OpenTelemetry community.
Related Reading
The Golden Signals
Understanding latency, traffic, errors, and saturation.
Distributed Tracing 101
A visual guide to understanding trace spans.
Log Normalization
Why unstructured logs are killing your observability.
Stop guessing. Start observing.
LogFlow specializes in building custom monitoring pipelines that turn your chaos into clarity. Let's discuss your architecture.