Daemion docs

How does Daemion work?

Daemion is a local-first agent OS: a persistent gateway running on your machine, a 6-substrate kernel that handles everything from storage to streaming, and a universal extension model where every capability is data — not code.


System overview

Phone / Browser (PWA)

        │  HTTPS via Tailscale (or localhost)

┌──────────────────────────────────────────┐
│           Gateway  :3001 (default)       │
│                                          │
│  HTTP/WebSocket API                      │
│                                          │
│  ┌────────────────────────────────────┐  │
│  │         6 Kernel Substrates        │  │
│  │  Extension  │ Context  │ Execution │  │
│  │  Trigger    │ Storage  │ Presentation│ │
│  └────────────────────────────────────┘  │
│                                          │
│  ┌────────────────────────────────────┐  │
│  │          Local Storage             │  │
│  │  SQLite  ·  Engram (Neo4j)         │  │
│  │  Filesystem                        │  │
│  └────────────────────────────────────┘  │
└──────────────────────────────────────────┘

        │  Agent SDK (not child_process)

   Claude API

The frontend is a static PWA served from Vercel — just a window into your local system. All messages, threads, extensions, and config live in SQLite on your machine. The gateway serves everything; the frontend never talks to an external database.


What is the gateway?

The gateway is a local HTTP/WebSocket server (port 3001 by default). It is the single entry point for all client communication — every message, thread list, extension CRUD, job run, and streaming response goes through it.

Core API surface:

MethodPathWhat it does
GET/healthHealth check, no auth required
POST/chatSend a turn, get a streaming response
GET/threadsList conversation threads
GET/threads/:id/turnsTurns for a thread
POST/threadsCreate a thread
POST/run/:jobExecute a job by name
GET/extensionsList all extensions
POST/extensionsCreate or update an extension
DELETE/extensions/:idRemove an extension
POST/reseedRe-sync built-in extensions from disk (no restart needed)
WS/streamStreaming turns and tool-call events

The API data model uses “turns” throughout — not “messages.” A turn is one exchange unit in a thread.

The gateway binds to 127.0.0.1 only. Remote access goes through Tailscale (a private WireGuard mesh), which means no open ports and no public internet exposure. Bearer token auth is required for all endpoints except /health.


What are the 6 kernel substrates?

The substrates are the OS primitives. Everything else — jobs, agents, commands, themes — plugs into them as extensions.

1. Extension Substrate

The meta-substrate. Registers, validates, loads, and manages all 12 extension types. Extensions are stored as JSON/YAML in SQLite — not compiled code. The agent can create extensions at runtime through chat.

See Extending Daemion for the full extension model.

2. Context Substrate

How the agent knows things. Inspired by Recursive Language Models — externalizes context rather than bulk-loading everything into the prompt.

Per request, the Context Substrate:

  1. Loads the last 10–15 turns from SQLite (always present, full text)
  2. Runs Engram recall in parallel (semantic + BM25 search over the knowledge graph)
  3. Provides the agent with 5 history tools: search_history, get_thread, list_threads, find_relevant, search_all

Older turns are queryable on demand — the agent retrieves them when it detects it needs them. This keeps prompts lean while preserving access to full history.

3. Execution Substrate

How the agent does work. Manages model selection, tool access, budgets, turn limits, streaming, cancellation, and concurrency.

Request typeModelMax turnsBudget
Chat (quick)sonnet10$0.50
Chat (complex)sonnet/opus25$5.00
Job executionconfigurable30$5.00
Build tasksonnet50$10.00

All Claude invocations use the Agent SDK — never child_process (known hang bug #771). The SDK provides streaming, tool access, and session management.

4. Presentation Substrate

How output appears in the UI. Renders all content types, streams tokens, shows tool calls as collapsible step indicators, and handles interactive elements (approve/deny buttons, forms).

Content pipeline: Agent output → type detection → renderer selection → display

Custom renderers are extensions of type renderer — a proposal-card renderer, a diff renderer, etc.

5. Trigger Substrate

What causes things to happen. Evaluates conditions and fires the appropriate response.

TypeFires when
messageUser sends a turn
commandUser types /command
cronTime-based schedule
watchFile changes on disk
webhookHTTP request received
eventInternal event fires
chainAnother extension completes

6. Storage Substrate

All data persists locally. No cloud database.

BackendStores
SQLiteTurns, threads, extensions, config, metrics, costs
Engram (Neo4j)Knowledge graph — facts, patterns, insights, relationships
FilesystemProject files, images, attachments, job outputs

What is the extension model?

Everything that isn’t the kernel is an extension. There are 12 types:

TypeWhat it is
commandInput handler (/, @, !, #)
themeVisual identity (colors, fonts)
jobAutonomous work unit
rendererCustom content display component
integrationExternal service connection (GitHub, Slack, Vercel)
actionPer-turn contextual action (copy, edit, regenerate)
widgetDashboard UI component
appEmbedded Vite application
artifactAgent-created output (code file, document)
capabilityAgent skill or behavior
controlSystem configuration (budget limits, model defaults)
agentPersistent agent identity

Extensions are data stored in SQLite — they don’t require compilation or deployment. The primary way to create one is by asking Daemion in chat. Agent-created extensions start disabled and require your approval to activate.

POST /reseed re-syncs built-in extensions from disk without restarting the gateway process.

See Extending Daemion for full schema, lifecycle, and examples.


How does a message flow end to end?

1. You type a message in the PWA

2. Frontend sends POST /chat {"thread_id": "thr_01abc123", "content": "..."}
   via Tailscale (or localhost) to the gateway

3. Gateway receives the request and authenticates the bearer token

4. Context Substrate assembles knowledge:
   a. Load last 10–15 turns from SQLite (always present)
   b. Query Engram for knowledge relevant to this turn (parallel)
   c. Check for extension-provided context (integrations, workspace state)

5. Execution Substrate invokes the agent (Agent SDK):
   a. Select model from request metadata or thread default
   b. Apply budget and turn limits based on detected complexity
   c. Stream tool calls + text tokens back via WebSocket /stream

6. Presentation Substrate formats output:
   a. Tool calls appear as step indicators ("Reading file...")
   b. Text streams token-by-token
   c. Code blocks get syntax highlighting on completion
   d. Final turn stored to SQLite

7. Frontend displays the streamed response

For jobs (autonomous, no user turn):

1. Trigger fires (cron schedule, file watch, event chain)
2. Engine loads the job definition (extension of type "job")
3. Context Substrate assembles job-specific context
4. Execution Substrate invokes the agent with the job prompt
5. Output routed: file write, Engram store, notification push
6. If job has chains → trigger the next job

What events does the WebSocket send?

The WebSocket at WS /stream sends 12 event types. All events are JSON with a type field:

json

{ “type”: “connected”, “threadId”: “thr_01abc123” } { “type”: “start”, “messageId”: “trn_07xyz456”, “model”: “claude-sonnet-4-5” } { “type”: “text-delta”, “messageId”: “trn_07xyz456”, “delta”: “Hello” } { “type”: “tool-start”, “messageId”: “trn_07xyz456”, “tool”: “Read”, “input”: ”…” } { “type”: “tool-end”, “messageId”: “trn_07xyz456”, “tool”: “Read”, “output”: ”…” } { “type”: “finish”, “messageId”: “trn_07xyz456”, “costUsd”: 0.003, “durationMs”: 1240 } { “type”: “error”, “messageId”: “trn_07xyz456”, “error”: “budget exceeded” } { “type”: “stopped”, “messageId”: “trn_07xyz456” } { “type”: “warning”, “text”: ”…” } { “type”: “extension-changed”, “extensionId”: “ext_09def789” } { “type”: “thread-updated”, “threadId”: “thr_01abc123” }

The frontend reconstructs the full turn from streamed events. Tool calls render as collapsible step indicators in the Presentation Substrate.


What are the storage backends?

SQLite is the primary store — turns, threads, extensions, config, and cost metrics all live there. Path defaults to ~/.daemion/daemion.db, overridable via DAEMION_DB_PATH.

Engram is optional. When Neo4j is running and credentials are set, Daemion stores and retrieves knowledge graph data across sessions. If Engram is unreachable, the gateway logs a warning and continues — responses still work, just without cross-session memory.

Filesystem access is provided via GET /filesystem/ls and GET /filesystem/search. Both take a path param that defaults to os.homedir() — the agent can browse your home directory. Scope this appropriately for your threat model.


Common questions

Q Are Daemion agents separate AIs?
No. Agents are Claude operating in a specific role, with a tailored system prompt and permissions. The literal system prompt in src/gateway/agent.ts reads: "You are Claude, operating as a Daemion agent." This is per Anthropic's licensing.
Q Why local-first instead of cloud?
Your turns, threads, extensions, and config live on your machine in SQLite. The Vercel PWA is just static HTML/JS — a window into your local system. There's no external database. Tailscale provides encrypted remote access without any cloud intermediary.
Q What's the difference between the gateway and the daemon?
The gateway is the HTTP/WebSocket API server. The daemon is the gateway plus autonomous capabilities: heartbeat (ambient awareness on wake), cron scheduler, file watcher, and job executor. daemion start runs the gateway only. The background service (launchd/systemd) runs the full daemon.
Q What is the heartbeat?
The heartbeat is a core engine concept — not a job. On each system wake, the daemon reads HEARTBEAT.md, checks ambient state, and replies HEARTBEAT_OK if nothing needs attention, or sends a notification if something does. It's how Daemion maintains ambient awareness without polling.
Q How does context work for long conversations?
The last 10–15 turns are always loaded in full. Older turns are queryable via 5 history tools the agent has access to (search_history, get_thread, etc.). Engram surfaces relevant knowledge via semantic search. There's no compression or summarization step — the agent retrieves what it needs.

What can go wrong

Common architecture questions

Gateway unreachable from phone — Tailscale must be connected on both devices. The gateway binds to 127.0.0.1 only; Tailscale routes correctly when both devices are on the same Tailscale network. Check tailscale status on both.

Engram recall not working — Verify Neo4j is running (brew services list | grep neo4j) and that NEO4J_URI, NEO4J_USER, and NEO4J_PASSWORD are set in the environment where the gateway starts. The service file and plist both need these if running as a background service.

401 {"error": "unauthorized"} — The bearer token is missing or expired. Re-pair the device: run daemion start, scan the QR code, and let the new token replace the old one in localStorage.

Filesystem endpoint returns homedir contents unexpectedlyGET /filesystem/ls defaults to os.homedir() when no path param is provided. Always pass an explicit path if you’re using this endpoint programmatically.


What’s next?