How do I stream responses via WebSocket?
ws://localhost:3001 Daemion’s gateway streams agent responses in real time over a persistent WebSocket connection. Every POST /chat returns a 202 immediately — the actual response arrives as a sequence of events on the socket. Connect once at startup and keep the connection alive for the life of your session.
How do I connect?
The WebSocket endpoint is /stream. Pass your bearer token as a token query parameter on the upgrade request.
Minimal connection check
wscat -c “ws://localhost:3001/stream?token=$DAEMION_TOKEN”
To scope events to a single thread, add threadId:
wscat -c “ws://localhost:3001/stream?token=$DAEMION_TOKEN&threadId=thr_01abc123”
Node.js connection example (npm install ws):
// npm install ws const WebSocket = require(‘ws’);
const token = process.env.DAEMION_TOKEN;
const ws = new WebSocket(ws://localhost:3001/stream?token={token});
ws.on(‘open’, () => { console.log(‘connected’); });
ws.on(‘message’, (data) => { const event = JSON.parse(data.toString()); console.log(event.type, event); });
ws.on(‘close’, (code, reason) => { console.log(‘disconnected’, code, reason.toString()); });
ws.on(‘error’, (err) => { console.error(‘ws error’, err); });
Connect before calling POST /chat. If you connect after, you will miss the streaming events for that turn. The replay buffer partially mitigates this — see the FAQ below.
What events will I receive?
There are 12 event types. All events are JSON objects with a type field.
| Event | Scope | Description |
|---|---|---|
connected | Global | First event after handshake. Confirms the connection and echoes your threadId. |
message | Thread | A complete turn object was saved to the thread (user or assistant). |
start | Thread | Agent began generating a response. |
text-delta | Thread | One chunk of streamed text from the agent. |
tool-start | Thread | The agent invoked a tool. |
tool-end | Thread | A tool call completed. |
finish | Thread | Response complete. Includes cost and token usage. |
error | Thread | The agent hit an error while generating. |
stopped | Thread | Response was aborted via POST /chat/stop. |
warning | Thread | Non-fatal warning from the engine. |
extension-changed | Global | An extension was created, updated, deleted, or toggled. |
thread-updated | Global | A thread’s title changed. |
Thread-scoped events (message, start, text-delta, tool-start, tool-end, finish, error, stopped, warning) are only delivered to clients subscribed to that thread or to global subscribers. Global events (extension-changed, thread-updated) are delivered to all connected clients.
What does each event look like?
connected
Sent immediately after a successful upgrade. Confirms your thread subscription.
{ “type”: “connected”, “threadId”: “thr_01abc123” }
threadId is null if you connected without a threadId parameter (global subscription).
message
A complete turn has been persisted to the database.
{ “type”: “message”, “message”: { “id”: “trn_07xyz456”, “thread_id”: “thr_01abc123”, “role”: “assistant”, “content”: “Here is the summary you asked for…”, “created_at”: “2026-03-31T12:00:00.000Z” } }
start
The agent started generating. Use this to show a typing indicator.
{ “type”: “start”, “messageId”: “trn_07xyz456”, “model”: “claude-opus-4-5” }
text-delta
One streaming text chunk. Concatenate these in order to build the full response.
{ “type”: “text-delta”, “messageId”: “trn_07xyz456”, “delta”: “Here is the” }
tool-start
The agent invoked a tool. input is the raw JSON string the agent passed to the tool.
{ “type”: “tool-start”, “messageId”: “trn_07xyz456”, “tool”: “bash”, “input”: ”{“command”: “ls -la”}” }
tool-end
The tool completed. output is the raw result string.
{ “type”: “tool-end”, “messageId”: “trn_07xyz456”, “tool”: “bash”, “output”: “total 48\ndrwxr-xr-x 12 user staff 384 Mar 31 12:00 .” }
finish
Response complete. Always follows the last text-delta.
{ “type”: “finish”, “messageId”: “trn_07xyz456”, “costUsd”: 0.0023, “durationMs”: 4210, “inputTokens”: 1842, “outputTokens”: 312, “cacheReadTokens”: 1200, “cacheWriteTokens”: 600 }
inputTokens, outputTokens, cacheReadTokens, and cacheWriteTokens are optional — present when the model returns them.
error
The agent hit an error while generating.
{ “type”: “error”, “messageId”: “trn_07xyz456”, “error”: “model overloaded — please retry” }
stopped
The response was aborted by a POST /chat/stop call.
{ “type”: “stopped”, “messageId”: “trn_07xyz456” }
warning
A non-fatal warning from the engine.
{ “type”: “warning”, “text”: “context window approaching limit — oldest turns may be dropped” }
extension-changed
An extension was created, updated, deleted, or toggled. Delivered to all connected clients.
{ “type”: “extension-changed”, “action”: “updated”, “extension”: { “id”: “ext_abc”, “type”: “agent”, “name”: “opus”, “enabled”: true } }
action is one of "created", "updated", "deleted", or "toggled".
thread-updated
A thread’s title was changed. Delivered to all connected clients.
{ “type”: “thread-updated”, “threadId”: “thr_01abc123”, “title”: “Q2 Planning Notes” }
How do I know when a response is complete?
Wait for a finish, error, or stopped event — all three signal that the agent is done for this turn. finish is the normal path. error means the agent failed. stopped means you (or something else) called POST /chat/stop.
ws.on(‘message’, (data) => { const event = JSON.parse(data.toString());
switch (event.type) {
case ‘start’:
showTypingIndicator();
break;
case ‘text-delta’:
appendText(event.delta);
break;
case ‘finish’:
hideTypingIndicator();
console.log(done in {event.durationMs}ms, cost ${event.costUsd.toFixed(4)});
break;
case ‘error’:
showError(event.error);
break;
case ‘stopped’:
showStopped();
break;
}
});
Complete working example
This script connects, sends a message via POST /chat, and prints the full streamed response.
// npm install ws const WebSocket = require(‘ws’);
const BASE = ‘http://localhost:3001’; const WS_BASE = ‘ws://localhost:3001’; const TOKEN = process.env.DAEMION_TOKEN ?? ”;
async function main() {
// 1. Connect WebSocket FIRST
const ws = new WebSocket({WS_BASE}/stream?token={TOKEN});
await new Promise((resolve, reject) => { ws.once(‘open’, resolve); ws.once(‘error’, reject); });
ws.on(‘message’, (raw) => { const event = JSON.parse(raw.toString());
switch (event.type) {
case ‘connected’:
console.log(‘ws connected, threadId:’, event.threadId);
break;
case ‘start’:
console.log(‘agent started, model:’, event.model);
break;
case ‘text-delta’:
process.stdout.write(event.delta);
break;
case ‘tool-start’:
console.log(‘\n[tool]’, event.tool, event.input);
break;
case ‘tool-end’:
console.log(‘[tool done]’, event.tool);
break;
case ‘finish’:
console.log(\n\ndone — {event.durationMs}ms, ${event.costUsd.toFixed(4)});
ws.close();
break;
case ‘error’:
console.error(‘agent error:’, event.error);
ws.close();
break;
}
});
// 2. Send the chat request
const res = await fetch({BASE}/chat, {
method: ‘POST’,
headers: {
‘Authorization’: Bearer {TOKEN},
‘Content-Type’: ‘application/json’,
},
body: JSON.stringify({
content: ‘What day is it?’,
agent_id: ‘haiku’,
}),
});
if (!res.ok) { console.error(‘chat failed’, res.status, await res.text()); ws.close(); } }
main().catch(console.error);
How do I handle reconnection?
The gateway sends a ping every 30 seconds. If your client misses pongs or the connection drops, reconnect with exponential backoff:
function connect(token, attempt) {
attempt = attempt ?? 0;
const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
const ws = new WebSocket(ws://localhost:3001/stream?token={token});
ws.on(‘close’, () => {
console.log(reconnecting in {delay}ms (attempt {attempt + 1}));
setTimeout(() => connect(token, attempt + 1), delay);
});
ws.on(‘open’, () => { attempt = 0; });
return ws; }
The replay buffer holds the last 50 streaming events per thread for up to 5 minutes. If you reconnect mid-stream, events buffered during the gap are replayed immediately after connected.
text-delta chunks if you connect a moment after POST /chat returns. The buffer is cleared 5 seconds after a finish, error, or stopped event.threadId (global subscription). Thread-scoped events won't be delivered to global subscribers, so connect globally if you want events from multiple threads.threadId gives you a global subscription: you receive extension-changed and thread-updated events. Connecting with threadId gives you that thread's streaming events. Global clients do NOT receive thread-scoped events (text-delta, finish, etc.).What can go wrong
401 on upgrade — The token query parameter is missing, wrong, or expired. Re-pair the device to get a fresh token. The connection is rejected at the HTTP upgrade step before the WebSocket handshake completes.
Connected but no streaming events — You connected with a threadId that doesn’t match the thread used in POST /chat. Double-check the thread_id returned by the 202 response.
Missed events after reconnect — The replay buffer covers up to 50 events over 5 minutes. If you were disconnected longer than that, you’ll need to fetch history via GET /threads/:id/turns. Don’t rely on the buffer for durable message storage.
Connection drops every 30 seconds — Your WebSocket client may not be responding to pings. The gateway pings every 30 seconds and will close idle connections. Enable pong handling or use a client library that handles it automatically (the ws npm package does this by default).
extension-changed events not arriving — These are global events and only go to clients connected without a threadId. If you subscribed to a specific thread, you won’t receive them.
What’s next?
- Conversations API —
POST /chat, threads, turns, and search - Quickstart — get the full setup running end to end