Untitled

How to Instrument AI Agents with OpenTelemetry GenAI Locally

Deploy a local OpenTelemetry Collector and Jaeger instance using Docker Compose with pinned image versions.
Configure the Collector with an OTLP gRPC receiver, batch processor, and OTLP HTTP exporter pointed at Jaeger.
Install the OpenTelemetry Node.js SDK, OTLP gRPC exporter, and your LLM provider's client library.
Initialize a tracer provider in a separate tracing.js file exporting to localhost:4317.
Annotate each AI call span with GenAI semantic convention attributes: gen_ai.system, gen_ai.request.model, gen_ai.operation.name, and token usage fields.
Structure multi-step agent flows using startActiveSpan to create parent-child span hierarchies with proper error recording.
Verify traces in the Jaeger UI at localhost:16686, confirming GenAI attributes appear in span tags.
Build a React dashboard that fetches trace data from Jaeger's API to visualize model usage, token costs, and latency.

OpenTelemetry GenAI semantic conventions offer a vendor-neutral framework to address the unique observability challenges AI agents present, and this tutorial shows how to run the entire pipeline locally without sending a single byte to the cloud.

Why Observability for AI Agents Is Different
What Are the OpenTelemetry GenAI Semantic Conventions?
Architecture Overview: Fully Local Observability Stack
Setting Up the Local OpenTelemetry Collector and Jaeger
Instrumenting a Node.js AI Agent with OpenTelemetry GenAI Attributes
Adding a React Dashboard for Real-Time Agent Metrics
Implementation Checklist: Your Complete Reference
Common Pitfalls and Troubleshooting
What Comes Next

Why Observability for AI Agents Is Different

Observability for AI agents presents challenges that traditional APM tools never handled. OpenTelemetry GenAI semantic conventions offer a vendor-neutral framework to address them, and this tutorial shows how to run the entire pipeline locally without sending a single byte to the cloud. Unlike deterministic API calls or database queries, AI agent outputs vary across identical inputs. Multi-step reasoning chains create branching execution paths that are difficult to trace with conventional request-response models. Token consumption directly impacts cost, making per-call tracking a financial concern rather than a nice-to-have. And prompts themselves often contain PII, proprietary business logic, or sensitive customer data that compliance frameworks bar from leaving controlled infrastructure.

Traditional APM tools can measure latency and error rates, but they lack the semantic understanding to capture model identifiers, token usage breakdowns, or operation types like chat versus embeddings. This gap leaves teams blind to the metrics that actually matter for GenAI workloads.

Token consumption directly impacts cost, making per-call tracking a financial concern rather than a nice-to-have.

This tutorial delivers a fully local, cloud-free instrumentation pipeline. It covers setting up an OpenTelemetry Collector and Jaeger in Docker, instrumenting a Node.js AI agent with GenAI semantic convention attributes, and building a React dashboard to visualize traces. Everything runs on the developer's machine.

What Are the OpenTelemetry GenAI Semantic Conventions?

The Standard Attributes You Need to Know

The OpenTelemetry GenAI semantic conventions define a standardized set of span attributes for AI and large language model operations. The core attributes include:

gen_ai.system: Identifies the AI provider, such as openai, anthropic, or a custom local model identifier.
gen_ai.request.model: Specifies the model being called, for example gpt-4o or claude-sonnet-4.
Token usage attributes gen_ai.usage.input_tokens and gen_ai.usage.output_tokens capture token counts from the provider's response, enabling cost tracking and usage analysis.
The operation type goes in gen_ai.operation.name, distinguishing chat from embeddings or other call types.

These conventions are currently at experimental maturity status, with a target of reaching stability in 2026 (see the OTel semantic conventions roadmap for current status). Attribute names may still evolve, though the working group has stabilized the overall structure across multiple spec revisions.

Why Semantic Conventions Matter for AI Agents

Vendor-neutral tracing across OpenAI, Anthropic, and local models means teams can switch providers without rebuilding their observability infrastructure. This tutorial demonstrates instrumentation against OpenAI specifically, but the attribute schema applies identically to other providers; only the value of gen_ai.system changes. Consistent attribute naming enables portable dashboards and alerts that work regardless of which LLM backs a given agent. The largest operational gain: correlating AI spans with traditional HTTP and database spans within the same distributed trace gives a complete picture of a request that touches both conventional services and AI models.

Architecture Overview: Fully Local Observability Stack

The architecture follows a three-component linear pipeline. A Node.js AI Agent sends telemetry through the OTel SDK to an OpenTelemetry Collector running locally, which then exports traces to a local Jaeger instance. The OTel SDK instruments application code and creates spans; the Collector receives, processes, and routes that telemetry to Jaeger, which stores and visualizes traces through its web UI.

The critical point: zero data leaves the developer's machine or private network. The configuration includes no cloud endpoints, no API keys to external observability platforms, and no third-party data processors in the chain.

Prerequisites:

Node.js 18+ (18.x or 20.x LTS recommended)
npm 9+
Docker Engine ≥ 24.x and Docker Compose v2 (the docker compose CLI plugin)
An OpenAI API key (or any LLM provider key) with billing enabled and available quota
Ports 4317, 4318, 13133, and 16686 free on localhost (check with lsof -i :4317)

Setting Up the Local OpenTelemetry Collector and Jaeger

Docker Compose Configuration

Create a docker-compose.yml and an accompanying otel-collector-config.yml in the project root:

# docker-compose.yml
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.103.0
    command: ["--config", "/etc/otel-collector-config.yml"]
    volumes:
      - ./otel-collector-config.yml:/etc/otel-collector-config.yml
    ports:
      - "4317:4317"   # OTLP gRPC receiver
      - "13133:13133" # Health check
    depends_on:
      jaeger:
        condition: service_healthy

  jaeger:
    image: jaegertracing/all-in-one:1.57.0
    ports:
      - "16686:16686" # Jaeger UI
      - "4318:4318"   # OTLP HTTP (used by the Collector exporter)
    # OTLP is enabled by default in Jaeger all-in-one v1.35+
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:16686"]
      interval: 5s
      timeout: 3s
      retries: 5

Note: Pin image tags to specific versions (as shown above) rather than using latest. Upstream breaking changes — such as the removal of the jaeger exporter type from collector-contrib in v0.104.0 — can silently break your pipeline. Check Docker Hub for the latest stable release tags when starting a new project.

# otel-collector-config.yml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024
    send_batch_max_size: 512

exporters:
  otlphttp/jaeger:
    endpoint: "http://jaeger:4318"
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/jaeger]

Verifying the Local Stack

Run docker compose up -d to start both services. Confirm Jaeger is accessible by opening http://localhost:16686 in a browser. Verify the Collector's health by hitting http://localhost:13133 — a healthy response ({"status":"Server available"}) confirms the Collector process is running. To verify the gRPC receiver specifically, check the Collector logs for "Starting OTLP gRPC receiver" output:

docker compose logs otel-collector | grep -i "grpc"

Instrumenting a Node.js AI Agent with OpenTelemetry GenAI Attributes

Installing Dependencies

First, initialize a project and create a package.json with pinned dependency versions:

npm init -y

Then install the required packages:

npm install @opentelemetry/sdk-node@0.52.1 @opentelemetry/api@1.9.0 \
  @opentelemetry/exporter-trace-otlp-grpc@0.52.1 @opentelemetry/sdk-trace-node@1.25.1 \
  @opentelemetry/resources@1.25.1 openai@4.56.0

Pin your @opentelemetry package versions in package.json. The GenAI semantic conventions are experimental, and attribute names may change between releases. Pinning ensures reproducible builds. Check the OpenTelemetry JS releases for compatible version sets.

This is the minimal dependency set for GenAI instrumentation with OTLP gRPC export. No cloud-specific exporters or vendor SDKs are required.

Configuring the OTel SDK for Local Export

// tracing.js
const { NodeTracerProvider, BatchSpanProcessor } = require("@opentelemetry/sdk-trace-node");
const { OTLPTraceExporter } = require("@opentelemetry/exporter-trace-otlp-grpc");
const { Resource } = require("@opentelemetry/resources");

const provider = new NodeTracerProvider({
  resource: new Resource({
    "service.name": process.env.OTEL_SERVICE_NAME || "ai-agent",
  }),
});

const exporter = new OTLPTraceExporter({
  url: "http://localhost:4317",
});

provider.addSpanProcessor(new BatchSpanProcessor(exporter));
provider.register();

// Export shutdown so agent.js can await it before process exit
module.exports = { shutdownTracing: () => provider.shutdown() };

console.log("Tracing initialized — exporting to localhost:4317");

Note the absence of any cloud endpoint or external API key. The exporter points exclusively to the local Collector. The tracing.js file is loaded before the agent code via Node's --require flag (shown in the "Running and Viewing Traces" section below), which ensures the tracer provider is registered before any application spans are created. The module exports a shutdownTracing function that flushes the BatchSpanProcessor buffer before the process exits — without this call, short-lived scripts will silently drop all spans.

Creating Traced AI Agent Calls

// agent.js
const { trace, SpanStatusCode } = require("@opentelemetry/api");
const { OpenAI } = require("openai");
const { shutdownTracing } = require("./tracing");

if (!process.env.OPENAI_API_KEY) {
  throw new Error("OPENAI_API_KEY environment variable is required");
}

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const tracer = trace.getTracer("ai-agent-tracer");

async function runAgent(userQuery) {
  return tracer.startActiveSpan("agent.run", async (parentSpan) => {
    try {
      // WARNING: Capturing user input in spans may include PII. Sanitize or omit in production.
      parentSpan.setAttribute("user.query", userQuery);

      // Step 1: Initial chat completion
      const chatResponse = await tracer.startActiveSpan(
        "gen_ai.chat.initial",
        async (chatSpan) => {
          chatSpan.setAttribute("gen_ai.system", "openai");
          chatSpan.setAttribute("gen_ai.request.model", "gpt-4o");
          chatSpan.setAttribute("gen_ai.operation.name", "chat");

          try {
            const response = await openai.chat.completions.create({
              model: "gpt-4o",
              messages: [
                { role: "system", content: "You are a helpful assistant." },
                { role: "user", content: userQuery },
              ],
            });

            // Null-guard: usage may be absent in certain response shapes
            if (response.usage) {
              chatSpan.setAttribute(
                "gen_ai.usage.input_tokens",
                response.usage.prompt_tokens
              );
              chatSpan.setAttribute(
                "gen_ai.usage.output_tokens",
                response.usage.completion_tokens
              );
            }
            chatSpan.setStatus({ code: SpanStatusCode.OK });
            chatSpan.end();
            return response.choices[0].message.content;
          } catch (err) {
            chatSpan.setStatus({
              code: SpanStatusCode.ERROR,
              message: err.message,
            });
            chatSpan.recordException(err);
            chatSpan.end();
            throw err;
          }
        }
      );

      // Step 2: Follow-up summarization based on initial response
      const followUp = await tracer.startActiveSpan(
        "gen_ai.chat.followup",
        async (followUpSpan) => {
          followUpSpan.setAttribute("gen_ai.system", "openai");
          followUpSpan.setAttribute("gen_ai.request.model", "gpt-4o");
          followUpSpan.setAttribute("gen_ai.operation.name", "chat");

          try {
            const response = await openai.chat.completions.create({
              model: "gpt-4o",
              messages: [
                {
                  role: "system",
                  content: "Summarize the following in one sentence.",
                },
                { role: "user", content: chatResponse },
              ],
            });

            if (response.usage) {
              followUpSpan.setAttribute(
                "gen_ai.usage.input_tokens",
                response.usage.prompt_tokens
              );
              followUpSpan.setAttribute(
                "gen_ai.usage.output_tokens",
                response.usage.completion_tokens
              );
            }
            followUpSpan.setStatus({ code: SpanStatusCode.OK });
            followUpSpan.end();
            return response.choices[0].message.content;
          } catch (err) {
            followUpSpan.setStatus({
              code: SpanStatusCode.ERROR,
              message: err.message,
            });
            followUpSpan.recordException(err);
            followUpSpan.end();
            throw err;
          }
        }
      );

      parentSpan.setStatus({ code: SpanStatusCode.OK });
      console.log("Agent result:", followUp);
      return followUp;
    } catch (err) {
      parentSpan.setStatus({
        code: SpanStatusCode.ERROR,
        message: err.message,
      });
      parentSpan.recordException(err);
      throw err;
    } finally {
      parentSpan.end();
    }
  });
}

// Top-level call with .catch() and guaranteed shutdown for flush
runAgent("What are the key benefits of edge computing?")
  .catch((err) => {
    console.error("Agent failed:", err.message);
    process.exitCode = 1;
  })
  .finally(() => {
    shutdownTracing().catch((err) =>
      console.error("Tracer shutdown error:", err.message)
    );
  });

The startActiveSpan calls create parent-child relationships automatically. The top-level agent.run span contains two child spans — gen_ai.chat.initial and gen_ai.chat.followup — each annotated with GenAI semantic convention attributes. Token counts are extracted directly from the OpenAI response's usage object with a null-guard, since some response shapes (such as streaming or certain error conditions) may omit the usage field. Error handling uses both span.setStatus() and span.recordException() to ensure failures are fully captured in traces. The top-level call includes .catch() to prevent unhandled promise rejections from crashing Node.js, and .finally() calls shutdownTracing() to flush the span buffer before exit.

Running and Viewing Traces in Jaeger

Store your OpenAI API key in a .env file rather than passing it inline on the command line. Inline environment variables are recorded in shell history and visible in process listings.

# .env (add this file to .gitignore)
OPENAI_API_KEY=sk-your-key

Execute the agent with the tracing module preloaded:

source .env && node --require ./tracing.js agent.js

The --require ./tracing.js flag ensures the tracer provider is registered before agent.js runs. Without it, spans will silently fail to capture because the provider is not yet initialized.

Open Jaeger at http://localhost:16686, select the ai-agent service from the dropdown, and click "Find Traces." The resulting trace should show nested spans: agent.run as the parent, with gen_ai.chat.initial and gen_ai.chat.followup as children. Clicking into any span reveals the GenAI attributes, including model name, token counts, and operation type, displayed in Jaeger's tag panel.

Adding a React Dashboard for Real-Time Agent Metrics

Fetching Trace Data from Jaeger's API

CORS Note: Jaeger's API does not enable CORS by default. Browser fetch() calls from a different origin will be blocked. To work around this, either run the React app using a dev server proxy (e.g., Vite's server.proxy option to proxy /api requests to http://localhost:16686), or start Jaeger with the flag --query.additional-headers 'Access-Control-Allow-Origin: *' added to the Jaeger service command in docker-compose.yml.

To scaffold a React project for the dashboard:

npm create vite@latest dashboard -- --template react
cd dashboard
npm install

Place the following component in src/AgentTraceViewer.jsx:

// AgentTraceViewer.jsx
import { useState, useEffect } from "react";

export default function AgentTraceViewer() {
  const [traces, setTraces] = useState([]);
  const [error, setError] = useState(null);

  useEffect(() => {
    const controller = new AbortController();

    fetch("/api/traces?service=ai-agent&limit=20", { signal: controller.signal })
      .then((res) => {
        if (!res.ok) throw new Error(`Jaeger API error: ${res.status} ${res.statusText}`);
        return res.json();
      })
      .then((data) => {
        const rows = (data.data || [])
          .flatMap((traceData) => {
            const spans = traceData.spans || [];
            const aiSpans = spans.filter((s) =>
              s.tags.some((tag) => tag.key === "gen_ai.system")
            );
            return aiSpans.map((s) => ({
              traceID: traceData.traceID,
              timestamp: new Date(s.startTime / 1000).toISOString(),
              model: s.tags.find((tag) => tag.key === "gen_ai.request.model")?.value,
              inputTokens: s.tags.find(
                (tag) => tag.key === "gen_ai.usage.input_tokens"
              )?.value,
              outputTokens: s.tags.find(
                (tag) => tag.key === "gen_ai.usage.output_tokens"
              )?.value,
              duration: (s.duration / 1000).toFixed(1) + " ms",
              status: s.tags.find((tag) => tag.key === "otel.status_code")?.value || "OK",
            }));
          });
        setTraces(rows);
      })
      .catch((err) => {
        if (err.name !== "AbortError") {
          setError(err.message);
        }
      });

    return () => controller.abort();
  }, []);

  if (error) {
    return <p style={{ color: "red" }}>Failed to load traces: {error}</p>;
  }

  return (
    <table style={{ borderCollapse: "collapse" }}>
      <thead>
        <tr>
          <th>Timestamp</th><th>Model</th><th>Input Tokens</th>
          <th>Output Tokens</th><th>Latency</th><th>Status</th>
        </tr>
      </thead>
      <tbody>
        {traces.map((row) => (
          <tr key={`${row.traceID}-${row.timestamp}`}>
            <td>{row.timestamp}</td><td>{row.model}</td><td>{row.inputTokens}</td>
            <td>{row.outputTokens}</td><td>{row.duration}</td><td>{row.status}</td>
          </tr>
        ))}
      </tbody>
    </table>
  );
}

If using Vite, configure the dev proxy in vite.config.js to forward /api requests to Jaeger:

// vite.config.js
export default {
  server: {
    proxy: {
      "/api": {
        target: "http://localhost:16686",
        changeOrigin: true,
        timeout: 5000,
        configure: (proxy) => {
          proxy.on("error", (err) => {
            console.error("[vite proxy] Jaeger unreachable:", err.message);
          });
        },
      },
    },
  },
};

Extending the Dashboard

This component is a starting point, not a production dashboard. To estimate cost per call, multiply input_tokens and output_tokens by the per-token rates published at platform.openai.com/pricing (e.g., input_tokens * rate_per_input_token). You can also chart latency over time using the duration values, or add filters by model or operation name. Libraries like Recharts can turn the flat data into time-series visualizations with minimal additional code.

Implementation Checklist: Your Complete Reference

Infrastructure

☐ Docker Compose with OTel Collector + Jaeger running locally (pinned image versions)
☐ OTel Collector config: OTLP receiver → batch processor → OTLP HTTP exporter to Jaeger
☐ No external endpoints configured: all data stays local

Instrumentation

☐ Node.js OTel SDK initialized with OTLP gRPC exporter to localhost:4317
☐ gen_ai.system attribute set on all AI spans
☐ gen_ai.request.model attribute set on all AI spans
☐ gen_ai.operation.name attribute set (chat, embeddings, etc. — this tutorial demonstrates chat)
☐ gen_ai.usage.input_tokens captured from API response
☐ gen_ai.usage.output_tokens captured from API response
☐ Parent-child span hierarchy for multi-step agent flows
☐ Error recording with span.setStatus() and span.recordException()

Verification

☐ Jaeger UI accessible and showing traces with GenAI attributes
☐ (Optional) React dashboard consuming Jaeger API for custom views

Common Pitfalls and Troubleshooting

Tracing file load order matters. If tracing.js is not required before the agent code runs, the tracer provider will not be registered and spans will silently fail to capture. Always use --require ./tracing.js.

You see OTLP export failed or no spans arrive? Check for a protocol mismatch between SDK and Collector. The exporter-trace-otlp-grpc package speaks gRPC. If the Collector is configured for http/protobuf, or vice versa, the connection fails silently. Match both sides. The gRPC exporter expects an http:// URL or a bare host:port — using an unsupported scheme like grpc:// causes the SDK to drop spans without error.

OpenAI is not a constructor. The openai npm package v4+ uses named exports under CommonJS. Use const { OpenAI } = require("openai"); (not const OpenAI = require("openai")).

When token counts are missing, check whether you are using streaming mode. Some LLM providers omit the usage object from streamed responses. If response.usage is undefined, the instrumentation cannot set token attributes. Always null-guard response.usage before accessing token fields, and check provider documentation for stream-specific options that include usage data.

The BatchSpanProcessor buffers spans and flushes them on a timer (default 5 seconds). In short-lived scripts, the runtime drops all buffered spans if the process exits before the flush fires.

The BatchSpanProcessor buffers spans and flushes them on a timer (default 5 seconds). In short-lived scripts, the runtime drops all buffered spans if the process exits before the flush fires. Always call provider.shutdown() (or the exported shutdownTracing() function) before process exit to force a flush.

Attribute name instability. The GenAI semantic conventions are experimental. Pin @opentelemetry package versions in package.json to avoid breakage when attribute names change between releases.

If the Collector exits immediately after starting, check docker compose logs otel-collector for errors. A common cause is using a deprecated exporter type (such as the jaeger exporter, which was removed from collector-contrib in v0.104.0). Use the otlphttp/jaeger exporter as shown in this tutorial.

No service appears in the Jaeger dropdown. Verify that spans are reaching Jaeger by checking the Collector logs and confirming the URL in tracing.js uses http://localhost:4317 (not grpc://).

Never pass API keys inline on the command line. Inline OPENAI_API_KEY=sk-... node ... commands are recorded in shell history (~/.bash_history) and visible in process listings. Use a .env file (added to .gitignore) and source .env instead.

What Comes Next

This tutorial delivers full local observability for AI agents using OpenTelemetry GenAI semantic conventions, with no data leaving controlled infrastructure. As the conventions move toward their stability target, attribute names will stabilize and library authors will ship auto-instrumentation for popular LLM SDKs. Teams can prepare now by adopting the manual instrumentation patterns shown here, then migrate to auto-instrumentation as the ecosystem catches up. Beyond traces, the natural next step is metrics. Start with aggregating token costs per model — the most immediate operational win — then layer on latency distributions and usage anomaly alerts as your instrumentation matures.