How to Debug AI Agent Traces Locally with HALO
- Install the HALO CLI globally via npm after verifying the package publisher.
- Initialize a HALO project with
halo initand configure thehalo.config.cjsfile. - Start the local HALO server with
halo serveto activate the trace collector and dashboard. - Instrument your Node.js AI agent using
@halo/trace-sdkwithstartTrace,span, andendTracecalls. - Generate a corpus of 10–20 traces by running varied queries through the instrumented agent.
- Analyze failure clusters and RLM insights in the React-based dashboard at
localhost:4280. - Export critical trace snapshots as portable JSON files for team review without cloud infrastructure.
- Iterate on agent prompts or tool logic based on systemic patterns, then re-run traces to validate fixes.
As multi-step AI agents built on tool-calling, ReAct, and plan-and-execute patterns proliferate, they produce nested execution traces spanning LLM calls, tool invocations, context assembly, and response synthesis. This tutorial walks through installing HALO, instrumenting a Node.js AI agent with its trace SDK, generating a corpus of 10–20 varied traces, and analyzing results through its React-based dashboard.
Table of Contents
- Why AI Agent Debugging Is Broken
- What Is HALO and How Does It Work?
- Prerequisites and Environment Setup
- Installing and Configuring HALO Locally
- Instrumenting a Node.js AI Agent for Trace Collection
- Analyzing Traces in the HALO Dashboard
- HALO vs. Cloud-Based Alternatives
- Implementation Checklist
- What Comes Next
Why AI Agent Debugging Is Broken
Important: HALO is a conceptual reference implementation described for educational purposes. Before installing, verify that the packages
@halo/cliand@halo/trace-sdkare published on the npm registry by runningnpm view @halo/cliandnpm view @halo/trace-sdk. Do not install unverified packages — a package claiming these names from an unknown publisher may pose a supply-chain risk. Verify the publisher and checksum at npmjs.com before proceeding.
As multi-step AI agents built on tool-calling, ReAct, and plan-and-execute patterns proliferate, they produce nested execution traces spanning LLM calls, tool invocations, context assembly, and response synthesis. Trying to debug these traces with console.log or standard APM tools is like diagnosing a distributed system failure with unstructured print statements and no aggregation layer. The traces are too deep, too interconnected, and too dependent on probabilistic LLM outputs for traditional approaches to surface useful insights.
Cloud-based observability platforms such as LangSmith, Langfuse Cloud, and Arize Phoenix (hosted) address some of these challenges. They provide structured trace visualization and analysis capabilities. But they carry per-seat SaaS fees (free tiers exist with limited trace quotas; check each vendor's pricing page for current limits), may require sending sensitive prompt data off-premises, and can introduce vendor lock-in that complicates architectural decisions.
HALO offers a different path: a locally-run trace analysis engine (source and license available at github.com/your-org/halo) that lets developers debug AI agent traces without a cloud subscription. HALO's trace analysis approach, which the project calls Recursive Language Model (RLM) analysis (a HALO-specific term rather than a standardized industry concept), detects systemic agent failure patterns that standard LLMs and simple log viewers miss entirely. This tutorial walks through installing HALO, instrumenting a Node.js AI agent with its trace SDK, generating a corpus of 10-20 varied traces, and analyzing results through its React-based dashboard.
What Is HALO and How Does It Work?
RLM-Based Trace Analysis Explained
HALO's trace analysis approach, which the project calls Recursive Language Model (RLM) analysis (a HALO-specific term describing its internal architecture), differs from single-pass LLM analysis. Where a standard LLM processes an entire trace as a flat input and produces a single analytical output, the RLM engine decomposes the problem recursively. HALO takes an agent trace, breaks it into sub-spans representing discrete execution steps, evaluates each layer independently for failure signals, and then correlates patterns across those layers and across multiple trace runs.
This recursive decomposition lets HALO detect systemic agent failures. Consider recurring hallucination loops where an agent repeatedly generates fabricated tool parameters, silent tool-call failures where a tool returns empty results without triggering error handling, or degrading context windows where accumulated conversation history causes progressive quality loss. A standard LLM analyzing a single trace in isolation might flag an individual error, but it cannot identify that the same failure pattern recurs across, say, half of all runs and shares a common root cause. HALO's RLM engine performs exactly this kind of cross-trace correlation.
A standard LLM analyzing a single trace in isolation might flag an individual error, but it cannot identify that the same failure pattern recurs across, say, half of all runs and shares a common root cause.
HALO's Core Components
HALO consists of four tightly integrated components. The Trace Collector runs on localhost as an OpenTelemetry-compatible ingestion layer, accepting spans and traces from instrumented applications using standard protocols. Running entirely on the developer's machine, the RLM Analysis Engine processes collected traces locally; verify with lsof -i -n -P | grep halo that no external connections are established before processing sensitive data. A React-based Dashboard lets developers explore traces, failure clusters, and root-cause suggestions the RLM engine generates. Finally, the Export and Replay system lets developers snapshot traces into portable formats so teammates can share traces without cloud infrastructure. (The replay feature is not covered in this tutorial; see the HALO documentation for details.)
Prerequisites and Environment Setup
Before installing HALO, ensure the development environment meets these requirements:
- Node.js v18.12.0 LTS or higher with npm v9+ or pnpm
- A working AI agent project, or willingness to use the sample agent provided below
- Basic familiarity with OpenTelemetry concepts (spans, traces)
- Docker (optional, for containerized HALO deployment)
OPENAI_API_KEYenvironment variable set and valid (see below)- 8GB or more of RAM recommended for responsive RLM analysis, since the engine runs locally. The
halo-rlm-basemodel runs CPU-only on x86 and ARM architectures. Check the HALO repository for current model file size and disk space requirements. - Internet access for npm install steps and OpenAI API calls
Set your OpenAI API key before running any agent code. Use a .env file with a library like dotenv, or export the variable in your shell session. Never hardcode keys in source files or pass them as inline shell arguments (which can expose them in shell history):
export OPENAI_API_KEY=sk-your-key-here
# Verify Node.js version (v18.12.0+ required)
node --version
# Verify publisher FIRST before installing
npm view @halo/cli
npm view @halo/trace-sdk
# Install HALO CLI globally (verify publisher at npmjs.com/package/@halo/cli before installing)
# --ignore-scripts reduces supply-chain attack surface from lifecycle scripts
npm install -g --ignore-scripts @halo/cli@1.0.0
# Verify HALO installation
halo --version
Installing and Configuring HALO Locally
Install via npm
With the CLI installed globally, initialize a new HALO project in the working directory. The halo init command creates the project directory and scaffolds the required configuration and storage structure.
# Initialize HALO project (creates the my-agent-debugger directory)
halo init --project my-agent-debugger
cd my-agent-debugger
Next, create a package.json for your project and install local dependencies:
# Create package.json with ESM support
npm init -y
# Set module type for ESM import/export syntax
npm pkg set type="module"
# Install project dependencies (pin exact versions for reproducibility)
npm install --ignore-scripts openai@4.28.0 @halo/trace-sdk@1.0.0
Your package.json should include:
{
"type": "module",
"dependencies": {
"openai": "4.28.0",
"@halo/trace-sdk": "1.0.0"
}
}
This generates a halo.config.cjs file with key configuration options. Note the .cjs extension: because the project uses "type": "module" for ESM agent code, the HALO config file uses the CommonJS .cjs extension to avoid module system conflicts:
// halo.config.cjs
const path = require('path');
module.exports = {
server: {
port: 4280,
traceCollectorPort: 4281,
},
storage: {
path: path.resolve(__dirname, './halo-traces'),
maxRetentionDays: 30,
},
rlm: {
model: 'halo-rlm-base',
analysisDepth: 3,
crossTraceCorrelation: true,
minTracesForSystemicAnalysis: 10,
},
dashboard: {
enabled: true,
openOnStart: false,
},
};
The rlm.analysisDepth controls how many recursive decomposition layers the engine applies. Each additional level above 3 roughly doubles analysis time on consumer hardware with 8GB RAM. The minTracesForSystemicAnalysis threshold defines the minimum trace corpus (10 traces) needed before cross-trace correlation activates.
Start the HALO Local Server
# Start the HALO local server
halo serve
# Expected output:
# ✔ Trace Collector running on http://localhost:4281
# ✔ RLM Analysis Engine loaded (model: halo-rlm-base)
# ✔ Dashboard available at http://localhost:4280
# ✔ Storage initialized at ./halo-traces
#
# HALO is ready. Waiting for traces...
Both the trace collector and dashboard run on localhost. As noted above, verify with lsof -i -n -P | grep halo that no external connections are established before processing sensitive data.
Instrumenting a Node.js AI Agent for Trace Collection
Building a Sample AI Agent
The following sample agent demonstrates a typical multi-step flow: accepting a user query, calling an LLM via the OpenAI SDK, optionally invoking a tool (web search), and returning a synthesized answer. This creates the kind of nested traces HALO is designed to analyze. Note the deliberate intermittent failure in the search tool, which returns empty results roughly half the time to generate interesting trace data.
// agent.js
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export const tools = [
{
type: 'function',
function: {
name: 'web_search',
description: 'Search the web for current information',
parameters: {
type: 'object',
properties: { query: { type: 'string' } },
required: ['query'],
},
},
},
];
// Simulated tool with intermittent empty results
export function webSearch(query) {
if (Math.random() > 0.5) {
return { results: [] }; // Deliberate failure point
}
return { results: [{ title: `Result for: ${query}`, snippet: 'Sample data.' }] };
}
export async function runAgent(userQuery) {
const messages = [
{ role: 'system', content: 'You are a research assistant. Use tools when needed.' },
{ role: 'user', content: userQuery },
];
let response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
tools,
tool_choice: 'auto',
});
let message = response.choices[0].message;
let retries = 0;
while (message.tool_calls && retries < 3) {
messages.push(message);
for (const toolCall of message.tool_calls) {
const args = JSON.parse(toolCall.function.arguments);
const result = webSearch(args.query);
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result),
});
}
response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
tools,
tool_choice: 'auto',
});
message = response.choices[0].message;
retries++;
}
return message.content;
}
Adding HALO's Trace SDK
The @halo/trace-sdk package provides three key functions for instrumentation. halo.startTrace() initiates a new trace context tied to a single agent execution. halo.span() wraps individual execution steps (LLM calls, tool invocations) with metadata collection. halo.endTrace() finalizes the trace and sends it to the local collector.
The halo.annotate() function attaches contextual data within the currently active span. It is context-bound: calls to halo.annotate() are only valid inside a halo.span() callback. Calling it outside a span context will have no effect.
// agent-traced.js
import { halo } from '@halo/trace-sdk';
import OpenAI from 'openai';
import { tools, webSearch } from './agent.js';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export async function runAgentTraced(userQuery) {
const trace = halo.startTrace({ name: 'research-agent', input: userQuery });
const messages = [
{ role: 'system', content: 'You are a research assistant. Use tools when needed.' },
{ role: 'user', content: userQuery },
];
const MAX_MESSAGES = 40; // prevent unbounded context growth
let finalOutput = null;
try {
let response = await halo.span(trace, {
name: 'llm-call-initial',
type: 'llm',
metadata: { model: 'gpt-4o-mini' },
}, async () => {
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini', messages, tools, tool_choice: 'auto',
});
halo.annotate({ tokenCount: res.usage?.total_tokens });
return res;
});
let message = response.choices[0].message;
let retries = 0;
const MAX_LOOP_ITERATIONS = 3; // total LLM calls = 1 initial + up to MAX_LOOP_ITERATIONS retries
while (message.tool_calls && retries < MAX_LOOP_ITERATIONS) {
messages.push(message);
for (const toolCall of message.tool_calls) {
let args;
try {
args = JSON.parse(toolCall.function.arguments);
} catch (parseErr) {
halo.annotate({ parseError: parseErr.message, rawArguments: toolCall.function.arguments });
throw new Error(`Failed to parse tool arguments: ${parseErr.message}`);
}
const result = await halo.span(trace, {
name: 'tool-call-web-search',
type: 'tool',
metadata: { query: args.query, attempt: retries + 1 },
}, async () => {
const res = webSearch(args.query);
halo.annotate({
resultCount: res.results.length,
status: res.results.length > 0 ? 'success' : 'empty',
});
return res;
});
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result),
});
}
// Prevent unbounded message growth before each LLM call
const trimmedMessages = messages.slice(-MAX_MESSAGES);
response = await halo.span(trace, {
name: 'llm-call-retry', // fixed name to avoid high-cardinality span keys
type: 'llm',
metadata: { model: 'gpt-4o-mini', retryAttempt: retries + 1 },
}, async () => openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: trimmedMessages,
tools,
tool_choice: 'auto',
}));
message = response.choices[0].message;
retries++;
}
if (message.content == null) {
console.warn('[runAgentTraced] Final message content is null — model may have stopped on tool_calls');
}
finalOutput = message.content ?? '';
} catch (err) {
halo.endTrace(trace, { output: null, error: err.message });
throw err;
}
halo.endTrace(trace, { output: finalOutput });
return finalOutput;
}
Each halo.span() block captures timing, metadata, and annotations. The halo.annotate() calls within spans attach contextual data such as token counts and tool response status, which the RLM engine uses during failure pattern analysis. The entire function body is wrapped in try/catch to ensure halo.endTrace() is always called, even when errors occur, preventing dangling open traces from corrupting the trace corpus.
Generating Sample Traces
HALO's systemic analysis requires multiple trace runs to identify cross-trace patterns. Single-trace inspection reveals individual errors; the RLM engine's strength lies in correlating failure signals across a corpus. Generate at least 10 to 20 traces with varied inputs.
Cost note: The batch script below makes 12 queries, each potentially involving multiple LLM calls (up to ~4 per query due to retries). This can result in up to ~48 OpenAI API calls per batch run. Verify your OpenAI account has sufficient credits before proceeding.
Create a runner script to handle ESM imports cleanly:
// run-query.mjs
import { runAgentTraced } from './agent-traced.js';
const query = process.argv[2];
if (!query) {
console.error('Usage: node run-query.mjs "<query>"');
process.exit(1);
}
runAgentTraced(query)
.then(() => console.log('Done:', query))
.catch((err) => {
console.error('Agent run failed for query:', query);
console.error(err);
process.exit(1);
});
Then use the batch script to generate traces:
#!/usr/bin/env bash
# batch-run.sh — Generate trace corpus with varied queries
set -euo pipefail
QUERIES=(
"latest AI research"
"weather in Tokyo"
"Node.js best frameworks"
"quantum computing news"
"React vs Vue comparison"
"stock market trends"
"machine learning tutorials"
"open source LLMs"
"WebAssembly use cases"
"serverless architecture patterns"
"Rust vs Go performance"
"API design principles"
)
FAILED=0
for query in "${QUERIES[@]}"; do
if ! node run-query.mjs "$query"; then
echo "[ERROR] Query failed: $query" >&2
FAILED=$((FAILED + 1))
fi
done
if [ "$FAILED" -gt 0 ]; then
echo "[WARN] $FAILED query/queries failed. Trace corpus may be incomplete." >&2
exit 1
fi
chmod +x batch-run.sh
./batch-run.sh
Analyzing Traces in the HALO Dashboard
Navigating the React-Based UI
Open localhost:4280 in a browser to access the dashboard. The interface presents four primary views. The Trace List displays all collected traces with timestamps, duration, and status indicators. Select any trace to open the Trace Waterfall, which renders a hierarchical visualization of spans within that trace, revealing latency distribution and nesting depth. Failure Clusters groups traces sharing common failure signatures as identified by the RLM engine. The RLM Insights view presents the engine's systemic analysis, including root-cause hypotheses and suggested fixes.
Interpreting RLM Analysis Results
With a corpus of 10 or more traces from the sample agent, the RLM engine surfaces a pattern: given the 50% random failure rate in the sample code, roughly half the traces will exhibit a "tool retry loop" where the agent calls the search tool up to three times with near-identical queries before exhausting the retry limit. The exact fraction varies by run. The RLM engine surfaces this as a systemic pattern rather than isolated incidents, generating a root-cause hypothesis such as "Agent prompt does not instruct query reformulation on empty tool responses."
A standard LLM-based analyzer processing individual traces would flag each empty tool response as a standalone error. It would not identify the cross-trace pattern showing that the agent never reformulates its query between retries. This distinction between per-trace error reporting and systemic failure detection is where HALO's RLM architecture provides its primary value.
This distinction between per-trace error reporting and systemic failure detection is where HALO's RLM architecture provides its primary value.
Exporting and Sharing Trace Snapshots
Trace snapshots let teammates share and review traces without cloud infrastructure.
# Export a specific trace as a portable JSON snapshot
# Obtain the trace ID from `halo list-traces` or the Trace List view in the dashboard
halo export --trace-id <trace-id> --output ./snapshots/retry-loop-example.json
# On a teammate's machine, import the snapshot
halo import --file ./snapshots/retry-loop-example.json
# View imported trace in their local dashboard
halo serve
HALO vs. Cloud-Based Alternatives
| Feature | HALO (Local) | LangSmith | Langfuse Cloud | Arize Phoenix (Hosted) |
|---|---|---|---|---|
| Runs locally | ✅ | ❌ | ❌ | ❌ |
| No subscription required | ✅ | ❌* | ❌* | ❌* |
| Data stays on-premises | ✅ | ❌ | ❌ | ❌ |
| HALO RLM engine | ✅ | ❌ | ❌ | ❌ |
| OpenTelemetry compatible | ✅ | ✅ | ✅ | ✅ |
| Team collaboration | Export/Import | Cloud-native | Cloud-native | Cloud-native |
| Production-scale monitoring | ❌ (dev-focused) | ✅ | ✅ | ✅ |
* Free tiers available for LangSmith, Langfuse Cloud, and Arize Phoenix; feature limits apply. Verify current pricing and limits at each vendor's website.
HALO targets local development and debugging, not production monitoring at scale. If you need multi-user dashboards or process thousands of traces per day, use a cloud platform. For teams iterating on agent architectures, HALO works well as complementary tooling: use it for local iteration and systemic pattern detection during development, then pair it with a cloud platform for production observability when scale demands it.
Implementation Checklist
- ☐ Node.js v18.12.0+ installed
- ☐
OPENAI_API_KEYenvironment variable set - ☐ HALO CLI installed globally (
npm i -g --ignore-scripts @halo/cli@1.0.0) — publisher verified on npmjs.com - ☐
package.jsoncreated with"type": "module"and dependencies installed - ☐
halo.config.cjsconfigured with storage path and port - ☐ HALO local server running (
halo serve) - ☐ AI agent instrumented with
@halo/trace-sdk - ☐ Minimum 10 trace runs generated for systemic analysis
- ☐ Dashboard reviewed for failure clusters and RLM insights
- ☐ Critical trace snapshots exported for team review
- ☐ Failure patterns addressed in agent prompt or tool logic
- ☐ Re-run traces to validate fixes
What Comes Next
Three extensions worth exploring from here. First, integrate HALO into CI pipelines for automated trace regression testing: run new agent code against a fixed set of queries and let HALO flag novel failure patterns before they reach production. Second, explore HALO's plugin API for custom analysis rules tailored to your specific agent architecture. Third, pair HALO with a cloud platform for production monitoring to get full lifecycle observability. The HALO GitHub repository (github.com/your-org/halo) and documentation cover each of these paths in detail. The HALO repository is not yet publicly available; check the URL for current status. Readers should independently verify HALO's network behavior before processing sensitive data.


