Daemion docs

How do I stream responses via WebSocket?

Auth: Bearer token — passed as ?token= query parameter on WebSocket upgrade
Base URL: ws://localhost:3001
All examples tested against live gateway

Daemion’s gateway streams agent responses in real time over a persistent WebSocket connection. Every POST /chat returns a 202 immediately — the actual response arrives as a sequence of events on the socket. Connect once at startup and keep the connection alive for the life of your session.


How do I connect?

The WebSocket endpoint is /stream. Pass your bearer token as a token query parameter on the upgrade request.

bash
Verified

Minimal connection check

wscat -c “ws://localhost:3001/stream?token=$DAEMION_TOKEN”

To scope events to a single thread, add threadId:

bash
Verified

wscat -c “ws://localhost:3001/stream?token=$DAEMION_TOKEN&threadId=thr_01abc123”

Node.js connection example (npm install ws):

typescript
Verified

// npm install ws const WebSocket = require(‘ws’);

const token = process.env.DAEMION_TOKEN; const ws = new WebSocket(ws://localhost:3001/stream?token={token});

ws.on(‘open’, () => { console.log(‘connected’); });

ws.on(‘message’, (data) => { const event = JSON.parse(data.toString()); console.log(event.type, event); });

ws.on(‘close’, (code, reason) => { console.log(‘disconnected’, code, reason.toString()); });

ws.on(‘error’, (err) => { console.error(‘ws error’, err); });

Connect before calling POST /chat. If you connect after, you will miss the streaming events for that turn. The replay buffer partially mitigates this — see the FAQ below.


What events will I receive?

There are 12 event types. All events are JSON objects with a type field.

EventScopeDescription
connectedGlobalFirst event after handshake. Confirms the connection and echoes your threadId.
messageThreadA complete turn object was saved to the thread (user or assistant).
startThreadAgent began generating a response.
text-deltaThreadOne chunk of streamed text from the agent.
tool-startThreadThe agent invoked a tool.
tool-endThreadA tool call completed.
finishThreadResponse complete. Includes cost and token usage.
errorThreadThe agent hit an error while generating.
stoppedThreadResponse was aborted via POST /chat/stop.
warningThreadNon-fatal warning from the engine.
extension-changedGlobalAn extension was created, updated, deleted, or toggled.
thread-updatedGlobalA thread’s title changed.

Thread-scoped events (message, start, text-delta, tool-start, tool-end, finish, error, stopped, warning) are only delivered to clients subscribed to that thread or to global subscribers. Global events (extension-changed, thread-updated) are delivered to all connected clients.


What does each event look like?

connected

Sent immediately after a successful upgrade. Confirms your thread subscription.

json

{ “type”: “connected”, “threadId”: “thr_01abc123” }

threadId is null if you connected without a threadId parameter (global subscription).


message

A complete turn has been persisted to the database.

json

{ “type”: “message”, “message”: { “id”: “trn_07xyz456”, “thread_id”: “thr_01abc123”, “role”: “assistant”, “content”: “Here is the summary you asked for…”, “created_at”: “2026-03-31T12:00:00.000Z” } }


start

The agent started generating. Use this to show a typing indicator.

json

{ “type”: “start”, “messageId”: “trn_07xyz456”, “model”: “claude-opus-4-5” }


text-delta

One streaming text chunk. Concatenate these in order to build the full response.

json

{ “type”: “text-delta”, “messageId”: “trn_07xyz456”, “delta”: “Here is the” }


tool-start

The agent invoked a tool. input is the raw JSON string the agent passed to the tool.

json

{ “type”: “tool-start”, “messageId”: “trn_07xyz456”, “tool”: “bash”, “input”: ”{“command”: “ls -la”}” }


tool-end

The tool completed. output is the raw result string.

json

{ “type”: “tool-end”, “messageId”: “trn_07xyz456”, “tool”: “bash”, “output”: “total 48\ndrwxr-xr-x 12 user staff 384 Mar 31 12:00 .” }


finish

Response complete. Always follows the last text-delta.

json

{ “type”: “finish”, “messageId”: “trn_07xyz456”, “costUsd”: 0.0023, “durationMs”: 4210, “inputTokens”: 1842, “outputTokens”: 312, “cacheReadTokens”: 1200, “cacheWriteTokens”: 600 }

inputTokens, outputTokens, cacheReadTokens, and cacheWriteTokens are optional — present when the model returns them.


error

The agent hit an error while generating.

json

{ “type”: “error”, “messageId”: “trn_07xyz456”, “error”: “model overloaded — please retry” }


stopped

The response was aborted by a POST /chat/stop call.

json

{ “type”: “stopped”, “messageId”: “trn_07xyz456” }


warning

A non-fatal warning from the engine.

json

{ “type”: “warning”, “text”: “context window approaching limit — oldest turns may be dropped” }


extension-changed

An extension was created, updated, deleted, or toggled. Delivered to all connected clients.

json

{ “type”: “extension-changed”, “action”: “updated”, “extension”: { “id”: “ext_abc”, “type”: “agent”, “name”: “opus”, “enabled”: true } }

action is one of "created", "updated", "deleted", or "toggled".


thread-updated

A thread’s title was changed. Delivered to all connected clients.

json

{ “type”: “thread-updated”, “threadId”: “thr_01abc123”, “title”: “Q2 Planning Notes” }


How do I know when a response is complete?

Wait for a finish, error, or stopped event — all three signal that the agent is done for this turn. finish is the normal path. error means the agent failed. stopped means you (or something else) called POST /chat/stop.

typescript
Verified

ws.on(‘message’, (data) => { const event = JSON.parse(data.toString());

switch (event.type) { case ‘start’: showTypingIndicator(); break; case ‘text-delta’: appendText(event.delta); break; case ‘finish’: hideTypingIndicator(); console.log(done in {event.durationMs}ms, cost ${event.costUsd.toFixed(4)}); break; case ‘error’: showError(event.error); break; case ‘stopped’: showStopped(); break; } });


Complete working example

This script connects, sends a message via POST /chat, and prints the full streamed response.

typescript
Verified

// npm install ws const WebSocket = require(‘ws’);

const BASE = ‘http://localhost:3001’; const WS_BASE = ‘ws://localhost:3001’; const TOKEN = process.env.DAEMION_TOKEN ?? ”;

async function main() { // 1. Connect WebSocket FIRST const ws = new WebSocket({WS_BASE}/stream?token={TOKEN});

await new Promise((resolve, reject) => { ws.once(‘open’, resolve); ws.once(‘error’, reject); });

ws.on(‘message’, (raw) => { const event = JSON.parse(raw.toString());

switch (event.type) { case ‘connected’: console.log(‘ws connected, threadId:’, event.threadId); break; case ‘start’: console.log(‘agent started, model:’, event.model); break; case ‘text-delta’: process.stdout.write(event.delta); break; case ‘tool-start’: console.log(‘\n[tool]’, event.tool, event.input); break; case ‘tool-end’: console.log(‘[tool done]’, event.tool); break; case ‘finish’: console.log(\n\ndone — {event.durationMs}ms, ${event.costUsd.toFixed(4)}); ws.close(); break; case ‘error’: console.error(‘agent error:’, event.error); ws.close(); break; } });

// 2. Send the chat request const res = await fetch({BASE}/chat, { method: ‘POST’, headers: { ‘Authorization’: Bearer {TOKEN}, ‘Content-Type’: ‘application/json’, }, body: JSON.stringify({ content: ‘What day is it?’, agent_id: ‘haiku’, }), });

if (!res.ok) { console.error(‘chat failed’, res.status, await res.text()); ws.close(); } }

main().catch(console.error);


How do I handle reconnection?

The gateway sends a ping every 30 seconds. If your client misses pongs or the connection drops, reconnect with exponential backoff:

typescript
Verified

function connect(token, attempt) { attempt = attempt ?? 0; const delay = Math.min(1000 * Math.pow(2, attempt), 30000); const ws = new WebSocket(ws://localhost:3001/stream?token={token});

ws.on(‘close’, () => { console.log(reconnecting in {delay}ms (attempt {attempt + 1})); setTimeout(() => connect(token, attempt + 1), delay); });

ws.on(‘open’, () => { attempt = 0; });

return ws; }

The replay buffer holds the last 50 streaming events per thread for up to 5 minutes. If you reconnect mid-stream, events buffered during the gap are replayed immediately after connected.


Q What is the replay buffer?
When a client connects (or reconnects) to a thread mid-stream, the gateway replays up to 50 buffered events from the last 5 minutes. This means you won't miss text-delta chunks if you connect a moment after POST /chat returns. The buffer is cleared 5 seconds after a finish, error, or stopped event.
Q Can I have multiple WebSocket connections open at once?
Yes. Multiple clients can connect to the same thread simultaneously — all receive the same events. This is how the mobile app and a CLI client can both show the stream at the same time.
Q Do I need to reconnect after each message?
No. One persistent connection handles all threads if you connect without a threadId (global subscription). Thread-scoped events won't be delivered to global subscribers, so connect globally if you want events from multiple threads.
Q What is a global vs thread subscription?
Connecting without threadId gives you a global subscription: you receive extension-changed and thread-updated events. Connecting with threadId gives you that thread's streaming events. Global clients do NOT receive thread-scoped events (text-delta, finish, etc.).

What can go wrong

Common issues

401 on upgrade — The token query parameter is missing, wrong, or expired. Re-pair the device to get a fresh token. The connection is rejected at the HTTP upgrade step before the WebSocket handshake completes.

Connected but no streaming events — You connected with a threadId that doesn’t match the thread used in POST /chat. Double-check the thread_id returned by the 202 response.

Missed events after reconnect — The replay buffer covers up to 50 events over 5 minutes. If you were disconnected longer than that, you’ll need to fetch history via GET /threads/:id/turns. Don’t rely on the buffer for durable message storage.

Connection drops every 30 seconds — Your WebSocket client may not be responding to pings. The gateway pings every 30 seconds and will close idle connections. Enable pong handling or use a client library that handles it automatically (the ws npm package does this by default).

extension-changed events not arriving — These are global events and only go to clients connected without a threadId. If you subscribed to a specific thread, you won’t receive them.


What’s next?