Stage 2 — Capability Gap Analysis¶
Paired with goal.md Stage 2. This is the deliverable before
building — per goal: "Show me the gap analysis before you start
closing it."
Three sections per the spec:
- What the current Claude Code workflow provides (the thing being replaced as Rob's primary interface)
- What HomeAIOps provides today (the thing replacing it)
- The gap, prioritized — what needs to be built so HomeAIOps becomes Rob's default
Plus a sub-audit per goal: which existing inputs feed the pipeline today, what schema they use, and where the new CLI plugs in.
1. What Claude Code provides today¶
The current daily-driver Rob has been using to operate the cluster (this session is itself an example). Components surfaced through this workflow:
| Capability | Provided by | Notes |
|---|---|---|
| Interactive prompt loop | Claude Code CLI | Synchronous turn-taking; tool calls visible inline |
| Tool dispatch | Bash, Edit, Read, Write, Agent, MCP tools |
Tools are claimed by name per turn; results stream back |
| Cluster operations | MCP tools (kubectl_*, ha_*, omada_*, etc.) via lovenet-gateway aggregator |
16 MCP servers reachable from the chat; same surface every session |
| Specialist delegation | Agent tool + named sub-agents (storage-operator, network-operator, etc.) |
Sub-agent gets prompt + isolated context; returns one report |
| Conversation history (per session) | Implicit — claude CLI's transcript file |
Session = ~/.claude/projects/...; replayable; survives /clear only by explicit save |
| Cross-session memory | Auto-memory at ~/.claude-personal/projects/<cwd-encoded>/memory/ |
Per-CWD partitioned; index in MEMORY.md; manual write via Rob's instruction |
| Todo / task tracking | TaskCreate / TaskUpdate / TaskList tools |
Lives only in current session's transcript; not durable across sessions, not visible to other surfaces |
| Background work | Bash run_in_background=true, Monitor, ScheduleWakeup |
Notifications come back as system reminders |
| Scheduled / recurring | /loop, /schedule, CronCreate (deferred) |
Time-based dispatch through the harness, but Rob still has to be present-ish |
| Output formats | Markdown directly to chat | No persistent artifact unless Rob asks for a file or commit |
| Model selection | /model slash-command; built-in routing |
Opus 4.7 default in this session; alternatives are CLI-flagged |
| Escalation | Manual — Rob can re-prompt or ask for a different sub-agent | No automatic "this is hard, get a bigger model" path |
| Cost visibility | None at the CLI layer; Anthropic billing dashboard is out-of-band | Per-session token use only visible if Rob queries the SDK |
| Authentication / authorization | Local Claude Code config + Anthropic API key in env | Single user (Rob) |
| Surfaces (input) | Terminal only | One human, one keyboard |
Net characterization: Claude Code is a single-user synchronous agent loop with strong tool integration and weak durability. Brilliant for one-at-a-time work; thin for queued work, scheduled work, or work that needs to outlive a terminal session.
2. What HomeAIOps provides today¶
Sourced from ai_architecture.md and verified through the Stage 1
audit. Pipeline shape is:
surface → bridge → langgraph-agents queue → triager → specialist → output
↓ ↑
(Claude API escalation) (vault / Zulip / ntfy / Langfuse)
| Capability | Provided by | Notes |
|---|---|---|
| Multi-surface input | HA voice, Zulip DM (Triager), Open WebUI, Khoj, AlertManager, Cron, ntfy tap | Voice + chat + push + scheduled all already wired |
| Durable task queue | task_queue + task_dlq tables in postgres-langgraph-checkpoints |
Backed by Postgres LISTEN/NOTIFY; 3-replica CNPG |
| Task dispatch | langgraph-agents queue worker |
Claims tasks, calls graph, writes results back |
| Specialist routing | Triager agent (LLM-driven) | Decides which specialist gets the task based on content |
| Specialist execution | 13 named agents (supervisor, researcher, coder, reviewer, triager, reporter, note-maker, homelab-engineer, smart-home-operator, ml-operator, errand-runner, property-coordinator, health-tracker) | All run inside langgraph-agents pod; share the same graph |
| Approval gate | interrupt() + Windmill langgraph-approval-post.ts + ntfy/Zulip |
Class A/B/C/D taxonomy; ntfy tap → /approval endpoint |
| Model selection | Per-agent default (qwen3-next:80b-a3b-instruct-q4_K_M on Spark) + requires_cloud tag → Claude API escalation |
Cost caps enforced in-cluster: $5/task, $10/agent/day, $30/global/day |
| Inference backends | ollama-spark (GB10), ollama (P40), tei-spark (reranker), Claude API | Routing in langgraph-agents/.agents/instructions/hardware-routing.md |
| Tool surface | MCP Gateway + 16 MCP servers via Istio | Same set as Claude Code, accessed differently (HTTP MCP not CLI MCP) |
| Persistent outputs | Vault (langgraph-vault PVC + RO sync from laptop), Zulip threads, ntfy push, Langfuse traces |
Outputs land in multiple sinks per task |
| Cross-agent memory | memory-mcp (Postgres + pgvector KG) |
bge-m3 embeddings via ollama-spark |
| Cost / spend visibility | /admin/costs/today endpoint + Windmill langgraph-cost-cap-watcher.ts |
Aggregates per-task / per-agent / global; Pushover alert on cap |
| Observability | Langfuse OTLP traces, Prometheus metrics, Loki logs | Per-task trace ID; spans for triager/specialist hops |
| Trace correlation | Each task has a trace_id propagated through langgraph + Windmill + Langfuse |
Cross-surface continuity |
| Surfaces (output) | Vault file, Zulip DM reply, ntfy push, Langfuse trace | Default depends on agent + source |
| Authentication / authorization | Per-surface (Authelia for Open WebUI/Khoj, Zulip bot tokens, ntfy app tokens, internal cluster network for Windmill→langgraph) | Already wired; Rob is the single user |
Net characterization: HomeAIOps is a multi-surface durable task pipeline with strong async/queue handling and weak interactive ergonomics. Excellent for "fire and forget" work; thin for the real-time turn-taking style Claude Code excels at.
3. Existing input-contract audit¶
Per the goal: "confirm: which existing inputs feed the pipeline today, what schema/contract they use, and where the CLI plugs into that same contract."
The langgraph-agents POST /inbox endpoint is the single
choke-point. Every bridge POSTs the same shape:
type InboxBody = {
task_id: string; // REQUIRED — server rejects without it
source: "voice" | "zulip" | "text" | "holmesgpt" | "test" | "openwebui";
content: string; // REQUIRED — the user's intent
user?: string; // defaults to "rob"
zulip_user_id?: number; // optional; if present, reply-back goes via triager-bot DM
};
Existing producers:
| Producer | Source value | task_id format | Reply-back path |
|---|---|---|---|
langgraph-inbox.ts (Windmill) |
passthrough from caller (default "voice") | passthrough or generated | varies by caller |
zulip-triager-webhook.ts |
"zulip" |
zulip-<ts>-<rand> |
triager-bot DM via zulip_user_id |
| Open WebUI (agent-as-model) | "openwebui" |
OpenAI-compat chat completion id | inline OpenAI-style response |
| HolmesGPT | n/a — separate path | n/a | n/a — HolmesGPT does not route through /inbox today |
| AlertManager → HolmesGPT | n/a — Windmill workflow calls holmesgpt/investigate, not /inbox |
n/a | Pushover + Zulip via separate workflow |
Important: HolmesGPT and AlertManager do not share the
/inbox contract. They have their own pipeline. That's by
design (alerts → diagnostic agent → no specialist routing
needed).
The contract is consistent across the surfaces that DO use
/inbox. Good — no implicit-contract refactor needed. The
CLI plugs in as a 6th producer:
// what the CLI will POST:
{
task_id: `cli-<ts>-<rand>`,
source: "cli", // NEW value — needs langgraph-agents enum update
content: <whatever Rob typed>,
user: "rob"
// no zulip_user_id; reply-back is the CLI's own polling loop
}
One small server-side change: langgraph-agents' Pydantic source
enum needs "cli" added. One-line PR in the langgraph-agents
repo; pairs with whatever-image-tag-bump in home-ops.
4. The gap, prioritized¶
What Claude Code has that HomeAIOps lacks — ranked by how much each blocks Rob's actual workflow when he switches to CLI-first.
Tier 0 — blockers; CLI cannot ship without these¶
-
No CLI surface exists. A terminal command that POSTs to
/inboxand shows the result. This is the headline Stage 2 deliverable. -
No
"cli"value in langgraph-agents' source enum. One- line langgraph-agents change. Trivial; pairs with the image bump renovate auto-generates. -
No result-streaming or fetch-by-task-id endpoint usable from CLI. Current state:
POST /inboxreturns{task_id, status, output?, paused_for?}synchronously, butoutputis null until the task completes async.- The CLI needs either (a) long-poll the queue for a specific task_id, (b) subscribe to a server-sent-events / websocket stream, or (c) wait for an inline output if the task is fast.
- Decision needed (see Open Questions).
Tier 1 — needed within Stage 2 DoD¶
-
Todo management. Goal explicitly: "Move todo management out of Claude Code into HomeAIOps' own store. Claude Code becomes a consumer, not the store." HomeAIOps has NO durable todo surface today. Claude Code's
TaskCreate/TaskUpdatetools are session-local. Build needed: a todo store in the pipeline (probably a new postgres table colocated withtask_queue), CRUD endpoints, and a CLI subcommand (hai todo add/hai todo ls/hai todo done). Claude Code consumes via either a new MCP server or by shelling out tohai. -
Conversation continuity / thread context. Claude Code sessions accumulate context turn by turn; each HomeAIOps task today is independent. For "ask follow-up about the task I just asked about," the CLI needs a thread / conversation_id primitive. Build needed: optional
conversation_idon/inboxbody; agents pull prior turns from checkpoint store on resume. -
Cross-session memory parity. Claude Code's
~/.claude-personal/projects/.../memory/is per-CWD and human-readable. HomeAIOps hasmemory-mcp(Postgres pgvector). For Rob's CLI workflow to feel continuous, either: - Bridge —
memory-mcpexposes a Claude-Code-style interface (read/write memory entries) the CLI can use, or - Sync — a periodic job reflects vault
/memory/*.mdintomemory-mcpand back.
Tier 2 — quality-of-life; nice for dogfooding to work¶
-
Per-turn tool-call visibility. Claude Code shows each tool call inline. HomeAIOps logs go to Langfuse asynchronously. For CLI dogfooding to not feel like a regression, the CLI should either (a) tail Langfuse spans for the task in near-real-time, or (b) langgraph-agents emits structured per-step events the CLI can subscribe to.
-
Cost / spend visibility surfaced at the CLI. Goal calls for "Cost/usage visibility — CLI subcommand showing local vs. escalated task counts and Claude spend over the last N days." The data exists at
/admin/costs/today; needs ahai costsubcommand that hits it. -
Routing policy file in repo. Stage 3 piece but flagged in Stage 2 because the gap analysis surfaces it: today the "local vs. escalate" decision lives in code in langgraph-agents (
requires_cloudtag + uncertainty threshold). Goal wants this as a version-controlled file that changes behavior when edited. Defer to Stage 3 unless trivial. -
Per-session naming / threading in HomeAIOps. Tasks today are anonymous to the user. Adding a friendly "session" / "thread" handle (
#stage2-cli-work) makes multi-task workflows browseable.
Tier 3 — out of Stage 2 scope; flag for Stage 3+¶
- Routing policy as version-controlled file (Tier 2 #9 fully fleshed out).
- Escalation client uniformity — Claude Code (headless) vs. Claude API choice spelled out as a policy.
- Provenance / per-task input-source recording — already in trace_id but not surfaced.
- Local-vs-escalate budget guardrail with auto-warning.
These are explicitly Stage 3 per goal.md and stay there.
5. Where the CLI plugs in¶
Concrete shape (subject to your review):
~/bin/hai task add "<intent>" → POST /inbox source=cli content=<intent>; print task_id; tail by default
~/bin/hai task tail <task_id> → poll/stream until done; print outputs
~/bin/hai task ls → GET /admin/tasks (today's); pretty-print
~/bin/hai task show <task_id> → GET /admin/tasks/<id> + Langfuse trace deep-link
~/bin/hai todo add "<note>" → new endpoint; persists to new todo table
~/bin/hai todo ls → list pending todos
~/bin/hai todo done <id> → mark complete
~/bin/hai cost → GET /admin/costs/today; pretty-print + recent history
~/bin/hai chat <task_id> "<follow-up>" → re-POST /inbox with conversation_id; show streamed output
Implementation language: probably Python (matches langgraph-agents) or Go (single static binary, no PYTHONPATH issues). Defer to your call.
Installation: $PATH per the goal's "installed on $PATH,
--help works" requirement.
Distribution: live in the langgraph-agents repo as a
subcommand (mirrors how the queue worker + agents are also
there), with pyproject.toml exposing hai as a console script
that pip-installs to ~/.local/bin.
Auth: cluster-internal access for the daemon (it talks to
langgraph-agents.ai.svc.cluster.local). For Rob's laptop:
either a cloudflared tunnel to the langgraph-agents service
(behind Authelia like Open WebUI is) OR kubectl port-forward
in the CLI's preamble. Either works; the former is more
elegant.
6. Open questions / decisions needed before I build¶
Per goal.md "When in doubt about scope, intent, or whether something counts as 'done' — ask."
Q1. Result delivery model¶
For hai task add to feel responsive at the terminal, how does
the CLI get the result?
- (a) Sync block —
/inboxalready returns immediately with astatus: "accepted". CLI then polls a new endpoint likeGET /admin/tasks/<id>/resultevery 1-2s until status flips todone. Simple; works today with a small server-side addition (/admin/tasks/<id>/resultreturning the queue row'sresultcolumn). What's printed to terminal is the final output blob. - (b) Server-sent events — langgraph-agents adds
GET /admin/tasks/<id>/streamreturning SSE: per-node events (node_start triager,node_end note-maker → output preview, etc.). CLI prints each event as it lands. Closer to the "Claude Code shows tool calls inline" experience but more server work. - (c) WebSocket — same as (b) but websocket. Probably overkill.
My recommendation: (a) for v1, (b) once we know v1 works. Locks the CLI design but doesn't bet the farm on the streaming infrastructure.
Q2. Todo store shape¶
Two ways to add durable todos:
- (a) Separate
todotable inpostgres-langgraph-checkpointsschema. Simple CRUD. Independent lifecycle from tasks. - (b) "Todo" as a special task type — todos are queued tasks that never get dequeued; they live in
task_queuewith a "type" tag and never reach a specialist. Closer to the cluster's existing primitives but conflates "intent to do later" with "intent to dispatch now."
My recommendation: (a) separate todo table. Cleaner semantics; doesn't pollute task queue counts; lets todos have their own constraints (no TTL, no idempotency, no agent assignment).
Q3. Auth path for laptop CLI¶
- (a) Cloudflared tunnel + Authelia + new HTTPRoute for
/admin/*(today only/approvalis exposed). Symmetric with Open WebUI / Khoj / etc. - (b)
kubectl port-forwardin the CLI's preamble. No new public surface; simpler; assumes Rob's laptop haskubectlconfigured (which it does).
My recommendation: (b) for v1. Avoids opening a new public ingress while we're still iterating on the API contract. Promote to (a) when the API is stable.
Q4. Where does the CLI source code live¶
Per the audit, langgraph-agents already hosts the queue worker
plus agents and is where the source enum needs updating.
- (a) Live in
langgraph-agentsrepo — single source of truth for the contract + the client; pyproject.toml exposeshaiconsole script. - (b) New
hai-clirepo — separation of concerns; client can release independently of agent fleet.
My recommendation: (a) live in langgraph-agents. The CLI and the API are coupled-by-design (CLI is a thin client over /inbox + /admin/*); separate repos add coordination overhead without separating actual concerns.
Q5. What's the bar for "dogfooded for a day"?¶
Per goal.md Stage 2 DoD: "One full day of dogfooding without falling back to Claude Code for the primary task flow."
Two interpretations:
- (a) Strict — any CLI session that worked through Claude Code for one full day. Hard to verify without instrumentation.
- (b) Pragmatic — Rob keeps a "what I did today" log; if he can credibly say "I used
haifor all the things I'd have asked Claude Code to do" at end of day, that's a pass.
My recommendation: (b). Stage 3 has stricter measurement (escalation-rate dashboards); Stage 2 dogfooding is the acceptance test, not the metrics.
7. Stage 2 build plan (proposed sequence, post-review)¶
Pending your answers to Q1-Q5, my proposed order:
- Update langgraph-agents
sourceenum to include"cli". One-line PR. - Add
/admin/tasks/<id>/resultendpoint to langgraph-agents (assumes Q1 = (a)). Small PR. - Add
todotable + CRUD endpoints to langgraph-agents (assumes Q2 = (a)). Migration + 4 endpoints. - Write the
haiCLI, living inlanggraph-agents/cli/(assumes Q4 = (a)). Subcommands:task add/tail/ls/show,todo add/ls/done,cost. - Image bump + helmrelease deploy through renovate.
- One day of dogfooding — Rob runs everyday work through
hai; I keep notes on rough edges. - Gate 2 evidence PR in home-ops with day-of-dogfooding log per goal.
Each step is one PR. Tier 2 items (#7, #8 from gap section) get added between #5 and #6 if dogfooding surfaces a need.
8. What I am NOT proposing in Stage 2¶
Per goal.md scope discipline:
- Routing policy file — Stage 3. Even though it's tempting because we're touching the routing layer in #1.
- Auto-escalation logic changes — Stage 3.
- Cost-cap watcher rewrite — works today; leave it.
- Per-step event streaming (Q1 option (b)) — defer until v1 sync polling proves the design.
- MCP server for memory parity — Tier 1 item #6 is real but I'd rather get the CLI shipped first and figure out memory in a v2 cycle. Flag for next gate.
9. Operator decisions (2026-05-21)¶
Rob answered Q1-Q5:
- Q1 = (a) sync poll. CLI polls
GET /admin/tasks/<id>/resultuntil terminal status. Defer SSE to v2. - Q2 = (a) separate
todotable. Distinct lifecycle fromtask_queue. - Q3 = (a) cloudflared / Authelia. Public surface from day one —
hai.${SECRET_DOMAIN}behind Authelia + oauth2-proxy, same pattern as Open WebUI / Khoj / etc. This changes the build plan: a new HTTPRoute for/admin/*, new oauth2-proxy instance (or reuse Open WebUI's), DNS via external-dns. The CLI uses a long-lived API token stored at~/.config/hai/token; minting flow TBD (likely a one-timehai auth logindevice flow, or operator-issued static token to start). - Q4 = (a) live in
langgraph-agentsrepo. CLI ships as a console_script inpyproject.toml. - Q5 = (b) pragmatic dogfood. Acceptance is Rob's credible end-of-day "I used
haifor everything I'd have asked Claude Code." No instrumented metric.
Build sequence (final, post-decisions)¶
- langgraph-agents PR-A —
"cli"source enum +GET /admin/tasks/<id>/result+todotable migration + CRUD endpoints + Pydantic models. One PR, one migration. - home-ops PR-B — new HTTPRoute
hai.${SECRET_DOMAIN}→langgraph-agents:8765, oauth2-proxy in front (reuselanggraph-agents-oauth2-proxypattern from existing apps), Authelia client config, external-dns A record, CNP egress. Self-contained infrastructure. - langgraph-agents PR-C — the
haiCLI:cli/subdir, console_script entry point in pyproject.toml,hai task add/tail/ls/show,hai todo add/ls/done,hai cost,hai chat, basic auth via token-file. Sync poll loop for results. - Image release — tag
v0.2.35after PR-A merges; laterv0.2.36after PR-C merges. (Or batch into one release.) - home-ops PR-D — image bump for the release(s) via renovate auto.
- Dogfood day — Rob installs
pip install langgraph-agents(or usespipx) on his laptop, runshai auth login, replaces normal Claude Code usage withhai. - home-ops PR-E (Gate 2) — evidence: dogfood log, transcript / sample tasks, confirmation that other non-CLI inputs still work (Zulip DM, voice, etc.), then your Gate 2 approval.
Open question for during build¶
Auth token issuance. Three flavors:
- Static operator-issued token — simplest. Generate once, put in 1Password under a new
hai-cliitem, ExternalSecret materializes it in the cluster, CLI reads~/.config/hai/tokenpopulated byhai auth set-token <value>. v1 ships with this. - Device flow —
hai auth loginopens a URL, Rob authorizes in browser, CLI receives refresh token. More user-friendly but more code; defer to v2. - Long-lived OAuth client credentials — overkill for a single-user CLI.
I'll go with (1) for v1 unless you say otherwise. The token can be replaced with device flow in a later iteration.
10. Status as of 2026-05-23 (post Stage-1 Gate signoff)¶
Build sequence steps 1-5 from §9 are complete. Verification:
| Step | Status | Evidence |
|---|---|---|
#1 lga PR-A — "cli" source enum + /admin/tasks/<id> + todo table |
✅ | src/agents/state.py::Source includes "cli"; src/agents/api/todos.py present; lga PRs #66-#68 merged |
#2 home-ops PR-B — hai.${SECRET_DOMAIN} route + auth + DNS |
✅ | hai.${SECRET_DOMAIN} resolves to the cluster gateway; CLI auth works against it |
#3 lga PR-C — hai CLI in cli/ subdir with subcommands |
✅ | which hai → /home/rwlove/.local/bin/hai; task add/tail/ls/show, todo add/ls/done, cost, auth all present |
| #4 Image release | ✅ | tags v0.2.34 through v0.2.42 published |
| #5 home-ops image bump (Renovate auto) | ✅ | cluster pod on v0.2.42 as of 2026-05-23 17:08Z |
Plus Stage-1.5 UX work landed 2026-05-23 (additive to the original plan — closing gaps that emerged from real Stage-1 dogfooding):
- Reporter agent → universal final hop renders raw specialist output
as rich-text Zulip DMs with clickable
obsidian://vault links + labeled URLs (lga#73/#75/#76/#77 + home-ops#11997). - Agent definitions migrated from vault into the lga repo — persona changes ship as PRs with image-tag traceability (lga#71/#73/#75).
ADMIN_NAMEinterpolated at the DM render boundary only — repo files stay generic per [[user_class_architecture]] (home-ops#11990).aihomeops-stateGrafana dashboard — single-pane view of queue depth, task state, escalation gates, cost (home-ops#11989).- Smart-home
device-intent-map.yaml+ drift-detect Windmill workflow (lga#78 + home-ops#11998).
What remains for Stage 2 DoD¶
Step 6: Dogfood day. Wall-clock day where ADMIN uses hai for
the work that previously would have been Claude Code sessions.
Per §9 Q5 decision: acceptance is ADMIN's credible end-of-day
"I used hai for everything I'd have asked Claude Code."
Step 7: Gate 2 evidence PR. Post-dogfood:
- Log of the day's
haiusage - Any fall-back-to-Claude-Code events (each is a P0/P1 gap the analysis missed)
- Confirmation that other non-CLI inputs still work (Zulip DM, Renovate-triage, AlertManager→HolmesGPT, daily-digest)
Gaps still recommended to close before the dogfood day¶
Re-reading §4 against current state, two items from earlier tiers remain actionable:
- Tier 1 #2 (conversation continuity /
conversation_id): still not built. For multi-turn workflows where ADMIN wants follow-ups on the same context, today eachhai task addis independent. Decide whether this is a v1 gap or punt to v2 based on what the dogfood day surfaces. - Tier 2 #6 (cost local-vs-escalated breakdown):
hai costexists; verify it shows local-vs-escalated split (data is in Langfuse +accumulated_cost_usd).
No new gaps surfaced from the Stage-1.5 work that aren't already in §4. The reporter agent / DM rendering / dashboard / intent map are all complementary to (not duplicative of) the CLI work.
End¶
Build starts after this PR merges.