Sentinel
How to read this doc
Sections 1β3 frame the goal and architecture. Section 4 lists the seven agents. Section 5 is the data model. Sections 6β8 cover models, the orchestrator, and how coverage auto-grows. Section 9 is the frontend specification β read this if you're touching UI. Sections 11β12 are the implementation map: files to create + a phased build sequence. Section 13 is the end-to-end verification checklist. Markdown source: docs/devtasks/sentinel.md.
Contents
- Context
- Architecture β four layers
- Multi-step transparency
- v1 agent set (7 agents)
- Data model β Supabase schema
- Reasoner & model strategy
- Orchestrator on alpu.ca
- Coverage extension β crawler + LLM-on-merge
- Frontend β Sentinel intranet section
- Notifications: daily digest
- Files to create / modify
- Build sequence β phased & sequenced
- Verification β end-to-end checklist
- Out of scope for v1
- Open questions
1. Context
The Sponic monorepo has grown a wide surface β two deployed apps (apps/garden, apps/control), Supabase Postgres with 50+ migrations and ~15 critical tables, three pg_cron jobs, two Cloudflare Workers, ~15 external services, a headless claude -p runner on Oracle Phoenix β with near-zero automated verification. CI runs only tsc + eslint + build + image-gen lint. There are no unit or e2e tests, no health endpoints, no uptime monitoring, no Sentry/Datadog, no dependency scanning, no pre-commit hooks. The only smoke test is a manual smoke-image-gen.mjs.
The tasks system already has a queue + headless claude -p runner + cost tracking + activity log on Oracle Phoenix. Sentinel reuses concepts (queueing, persisted runs, cost telemetry) but runs on a separate orchestrator on alpu.ca so a Sentinel failure can't break the tasks UI and vice versa.
Goal
Ship a system that:
- Runs automated checks across every surface daily.
- Auto-extends coverage as new dev work lands (without manual setup per feature).
- Presents activity, agents, findings, coverage in a dedicated intranet section called Sentinel, with first-class transparency into every multi-step run (prompts, models, handoff artifacts, costs).
- Generates structured recommendations without auto-acting on them. (Test-coverage findings can be approved & queued to the prompt-runner via a manual gate.)
2. Architecture β four layers
- Probes β deterministic checks. Cheap, scriptable, structured-output. Live in
apps/control/src/lib/sentinel/probes/. Each probe is a TS function(ctx) => ProbeResult. Probes can attach artifacts (HTTP response bodies, screenshots, stack traces, log excerpts) which are stored either inline or in R2. Each probe is also exposed to the Agent SDK as a tool β so probes serve double duty: directly callable in deterministic agents, and tool-callable by LLM-driven agents. - Reasoner β LLM execution layer with two modes:
- One-shot (
reasoner/oneshot.ts): direct Anthropic API call with structured output schema. Used by deterministic agents that just need a brief summary + verdict over already-collected probe results. Cheap, fast, no tool calling. - Agent-loop (
reasoner/agent-loop.ts): Claude Agent SDK initialized with a custom tool catalog (the probes from layer 1, plus a few orchestrator-supplied tools likerecord_finding,query_supabase,read_file). Used by agents that need to navigate the codebase or chain probes based on what they find β Test coverage and Security passive primarily. - Per-agent config selects which mode + model. The mode is stored on the agent record and locked in at run time.
- One-shot (
- Orchestrator β systemd service on alpu.ca. Triggers from cron + an HMAC-signed webhook for manual UI runs. Reads the agent registry from Supabase, dispatches each run to the configured execution mode, collects step events emitted by the SDK (or by the one-shot path), persists every step (prompts, context, output, cost, tool calls, model rationale) to
monitoring_run_steps. Source:infra/alpu/sentinel-orchestrator/. - UI β new intranet section
Sentinelinapps/control. Seven tabs: Overview, Coverage, Agents, Activity, Findings, Runs, Docs. Detailed in Β§9.
Layers are decoupled: probes are TS functions usable from either execution mode; the orchestrator can be moved (e.g. to Oracle Phoenix later) without touching probes/UI; the SDK version can be upgraded without touching probes or UI.
Why the Agent SDK and not Anthropic Managed Agents?
We considered hosted Managed Agents and rejected for v1 because (a) deterministic probes shouldn't pay token costs to wrap answers we already have, (b) active security probes need a stable origin so we don't trip our own WAF, (c) gitleaks/OSV/diff-readers need direct repo access, (d) the SDK preserves our v1.1 path to migrate select agents to a local Ollama model on alpu.ca. Using the SDK inside our own orchestrator gets the agentic-loop ergonomics without those tradeoffs.
3. Multi-step transparency
Every agent run is decomposed into discrete steps that are individually persisted. A step is anything with a clear input β output and an attributable model/cost. Examples:
- Probe execution (
step_kind = probe) β input = probe config; output = ProbeResult + artifacts. Used by both modes. - One-shot reasoner call (
step_kind = reasoner) β input = collected probe outputs + diff context; output = structured findings. Used only by one-shot agents. - Agent-loop turn (
step_kind = agent_turn) β one round of the SDK loop: model produced a message and optionally invoked tools. Output includes model text + tool-use intents. - Tool call (
step_kind = tool_call) β input = tool name + arguments; output = tool result. Emitted by the SDK whenever the agent calls one of the probes/utilities in its catalog. - Aggregation (
step_kind = aggregate) β combining outputs across steps; rare, used for orchestrator-level deduping.
For agent-loop runs, the SDK emits per-turn and per-tool-call events natively; the orchestrator subscribes to those and writes them straight into monitoring_run_steps. We do not hand-roll the multi-step machinery β we adopt it.
Each step records: kind, model used (for steps where a model was invoked), model rationale (locked in at run time from the agent config), the prompt template id + rendered prompt text or system prompt, the full input context (jsonb), the output (jsonb), tool name + arguments + result for tool calls, references to artifacts produced, references to artifacts consumed (handoff lineage), duration, tokens, cost, status.
In the UI, clicking into any run shows a stepper with all of this. For agent-loop runs, the stepper renders as alternating agent turns and tool calls with arrows showing which turn invoked which tool. For one-shot runs, it's a flat list of probes followed by a single reasoner step. Both surface identically in the UI; the underlying mode is just a chip on the run header.
4. v1 agent set (7 agents)
Each agent owns a domain, has its own model config + execution mode, and registers its probes in monitoring_manifest. Mode = how the orchestrator runs it: one-shot (collect probe results, summarize once) or agent-loop (Claude Agent SDK with tool catalog, iterates).
| # | Agent | Mode | Probes / behavior |
|---|---|---|---|
| 1 | Uptime | one-shot | HTTP 200 on every garden + control route (auto-discovered from apps/*/src/app/**/page.tsx); HTTPS cert expiry on both domains; claude-sessions.sponicgarden.workers.dev reachable. Reasoner summarizes failures + emits findings. |
| 2 | Deploy verifier | one-shot | Latest CF Pages deploy state (both projects) via CF API; build-log scan for warnings; commit SHA freshness vs origin/main; version-bump file consistency. |
| 3 | Database integrity | one-shot | Row counts Β± delta on critical tables (app_users, tasks, images, image_gen_jobs, event_payments, rental_payments, stripe_payments); R2 β public.images reconciliation; pg_cron last-run age (3 jobs); migrations applied vs apps/control/migrations/ files. |
| 4 | Edge function health | one-shot | Each Supabase edge fn returns expected status; auth-required ones return 401 without keys; runs smoke-image-gen.mjs as a probe. |
| 5 | Security passive | agent-loop | Tool catalog: run_gitleaks, query_osv, read_file, git_diff_since, record_finding. Agent decides what to scan, follows up on suspicious diffs, can correlate (e.g. "this commit added an auth function β read it and check for bypass patterns"). |
| 6 | Security active | one-shot | Deterministic probes: anon-key calls that should fail on RLS-protected tables; public R2 bucket inventory check; unauth probe on every edge fn endpoint. Reasoner classifies any unexpected results. |
| 7 | Test coverage | agent-loop | Tool catalog: list_routes, list_edge_fns, list_critical_files, read_file, find_existing_tests, rank_criticality, draft_test_scaffold, record_finding. Agent inventories surfaces, decides what's worth testing, drafts scaffolds, emits findings with metadata.draft_task_payload. UI shows an Approve & queue button per finding β on click, inserts a row into the existing tasks table for the Oracle Phoenix prompt-runner. Manual gate stays in v1. |
Plus, outside alpu.ca, a tiny dead-man-switch Cloudflare Worker (apps/control/worker/sentinel-deadman/) that flags a finding row if no orchestrator heartbeat in 25h.
Test-framework prerequisite
The project has no test framework today. Adding Vitest (unit/integration) + Playwright (e2e) is a prerequisite, delivered in Phase 0 as a prompt-runner task β Claude opens the PR; admin reviews and merges; the agent then operates against a real framework.
5. Data model β Supabase schema
New migration: apps/control/migrations/20260506_sentinel_schema.sql. All tables admin-only via RLS (role check against app_users.role IN ('oracle','admin')).
monitoring_agents
id, name, slug, description, surface, owner,
execution_mode text, -- 'one-shot' | 'agent-loop'
model_provider, model_id, model_params jsonb,
model_rationale text, -- short "why this mode + model" string
system_prompt_template_id uuid, -- agent-loop only
tool_catalog text[], -- agent-loop only; subset of registered tool names
max_turns int, -- agent-loop only; hard cap on SDK loop iterations
schedule_cron text, enabled bool,
daily_cost_ceiling_usd numeric, -- auto-pause if exceeded in 24h window
timeout_seconds int default 600,
last_run_id, last_status, last_run_at,
created_at, updated_at
monitoring_runs
id, agent_id, trigger ('cron'|'manual'|'deadman'),
triggered_by uuid (app_users.id, nullable for cron),
triggered_meta jsonb,
started_at, completed_at,
status ('queued'|'running'|'success'|'partial'|'failed'|'cancelled'|'timeout'),
summary text, cost_usd numeric, total_tokens int,
models_used text[], step_count int
monitoring_run_steps
id, run_id, agent_id, step_index int, step_name,
step_kind ('probe'|'reasoner'|'aggregate'|'tool_call'),
model_provider text, model_id text, model_rationale text,
prompt_template_id, prompt_text,
input_context jsonb, output jsonb,
artifact_ids uuid[], -- references monitoring_artifacts
consumed_artifact_ids uuid[], -- handoff lineage
started_at, completed_at, duration_ms int,
prompt_tokens int, completion_tokens int, cost_usd numeric,
status ('success'|'fail'|'timeout'|'error'|'skipped'),
error text
monitoring_prompt_templates
id, name, version int, agent_slug, step_kind,
description text, content text, variables text[],
created_at, deprecated_at,
unique (name, version)
monitoring_probes
id, run_id, run_step_id, agent_id, probe_name, target_kind, target_ref,
status ('pass'|'fail'|'warn'|'skip'|'error'),
output jsonb, duration_ms int, error text,
baseline_duration_ms int -- rolling p50 for regression hints
monitoring_artifacts
id, run_id, run_step_id (nullable), agent_id,
kind ('http_response'|'screenshot'|'stack_trace'|'log_excerpt'
|'json_blob'|'diff'|'test_scaffold'),
storage ('inline'|'r2'),
inline_data text (nullable), r2_key text (nullable),
size_bytes int, content_type text,
created_at
monitoring_findings
id, dedup_key (unique per agent), agent_id,
first_seen_run_id, last_seen_run_id, occurrence_count int,
consecutive_run_count int,
severity ('critical'|'high'|'medium'|'low'|'info'),
title text, description text, recommended_action text,
surface text, target_ref text,
metadata jsonb, -- carries draft_task_payload etc.
status ('open'|'acknowledged'|'dismissed'|'resolved'),
ack_by, ack_at, ack_notes,
resolved_at,
created_at, updated_at
monitoring_manifest
id, agent_id, target_kind ('route'|'table'|'edge_fn'|'package'
|'file'|'cron_job'|'service'),
target_ref text, probe_config jsonb,
source ('crawler'|'llm_merge'|'manual'|'bootstrap'),
approval_status ('active'|'pending_approval'|'rejected'),
added_at, added_by, last_seen_at, enabled bool
monitoring_heartbeats
id, source ('orchestrator'|'deadman'), recorded_at, meta jsonb
monitoring_audit_log
id, actor_id (app_users.id, nullable for system),
action text, -- 'agent_model_changed'|'agent_paused'|...
target_kind, target_id,
before jsonb, after jsonb,
notes text,
created_at
Dedup keyed on (agent_id, dedup_key). Each probe computes a stable dedup key (e.g. uptime:route:/en/relations/fundraising:status_5xx). On second occurrence the existing finding's last_seen_run_id and occurrence_count increment instead of creating a new row.
Regression-aware severity. When a finding first opens, severity comes from the reasoner. When the same dedup key reopens after N successful runs (default 30), severity is bumped one level. When a probe's duration exceeds 3Γ its baseline_duration_ms, a performance_regression finding is auto-generated.
Retention. monitoring_run_steps, monitoring_probes, monitoring_artifacts rows older than 90d auto-delete via pg_cron job; R2 artifacts pruned in step. monitoring_runs, monitoring_findings, monitoring_audit_log retained indefinitely.
6. Reasoner & model strategy
The reasoner is the LLM execution layer. Two paths:
6.1 One-shot path (reasoner/oneshot.ts)
Direct Anthropic API call with a structured-output schema. The orchestrator collects probe results, hands them to the reasoner alongside diff context and manifest summary, and gets back a summary + findings array. No tool calling; one prompt, one response.
export interface OneShotInput {
agent: { name: string; surface: string; description: string };
runContext: { startedAt: Date; trigger: string; previousRunSummary?: string };
probes: ProbeResult[];
diffSinceLastRun?: { commits: string[]; files: string[]; patches: string };
manifestSummary: { covered: string[]; new: string[] };
}
export interface ReasonerOutput {
summary: string;
findings: Array<{
dedupKey: string;
severity: 'critical'|'high'|'medium'|'low'|'info';
title: string;
description: string;
recommendedAction: string;
surface: string;
targetRef?: string;
}>;
cost: { promptTokens: number; completionTokens: number; usd: number };
modelUsed: string;
}
export interface OneShotReasoner { ask(input: OneShotInput): Promise<ReasonerOutput>; }
Used by: Uptime, Deploy verifier, DB integrity, Edge function health, Security active.
6.2 Agent-loop path (reasoner/agent-loop.ts)
Built on the Claude Agent SDK (@anthropic-ai/agent-sdk, same library as Claude Code). The orchestrator instantiates an SDK session per agent run with:
- The agent's system prompt (loaded from
monitoring_prompt_templates) - A tool catalog β TS functions exposed as SDK tools (see Β§6.3)
- Model + max-turns from agent config
- An event subscriber that writes every turn + tool call to
monitoring_run_stepsin real time
Used by: Security passive, Test coverage.
The SDK handles the loop (model β tool call β tool result β model β β¦) until the model emits no more tool calls or max_turns is reached. Final emitted findings come via the record_finding tool β the orchestrator collects these and persists them after dedup.
export interface AgentLoopRunner {
run(input: {
agent: AgentConfig;
systemPrompt: string;
tools: ToolCatalog;
initialContext: { runId: string; diff?: GitDiff; manifest: ManifestSummary };
maxTurns: number;
onStep: (step: PersistableStep) => Promise<void>; // streamed to monitoring_run_steps
}): Promise<{ findings: Finding[]; cost: Cost; modelUsed: string; turns: number }>;
}
6.3 Tool catalog
Lives in apps/control/src/lib/sentinel/tools/. Each tool is a typed function with a Zod input schema, exposed to the SDK via the standard tool-definition shape. Tools delegate to the same probe library used by one-shot agents β so the capability is shared, only the calling style differs.
| Tool | Purpose | Used by |
|---|---|---|
http_probe(url, expectedStatus?) | Fetch a URL; record artifact on failure | Uptime (one-shot calls directly) + agent-loop |
query_supabase(sql, params) | Execute a read-only SQL query under service role | DB integrity, Test coverage, Security passive |
read_file(path) | Read a file from the working tree | Test coverage, Security passive |
git_diff_since(commit_sha) | Diff between commit and HEAD | Security passive |
run_gitleaks(scope) | Run gitleaks against working tree; return JSON report | Security passive |
query_osv(packageJsonPath) | Hit OSV.dev for advisories on a package.json | Security passive |
list_routes() / list_edge_fns() / list_critical_files() | Inventory helpers built on the manifest | Test coverage |
find_existing_tests(targetPath) | Look up Vitest/Playwright tests covering a target | Test coverage |
draft_test_scaffold(target, framework) | Produce a starter test file body | Test coverage |
cf_pages_status(project) | Latest deploy state via CF API | Deploy verifier (one-shot calls directly) + agent-loop |
r2_list(bucket, prefix?) | List R2 objects | Security active, DB integrity |
record_finding(payload) | Emit a finding (severity, title, description, recommendedAction, surface, dedupKey, metadata?) | All agent-loop agents |
Tools execute on the orchestrator process (not on Anthropic infra) β they have direct local network access to Supabase, R2, the working tree, and CF API.
6.4 Default model + mode assignments for v1
| Agent | Mode | Model | Rationale stored on agent |
|---|---|---|---|
| Uptime | one-shot | Haiku 4.5 | "Low-complexity summarization of HTTP probe results β Haiku is sufficient." |
| Deploy verifier | one-shot | Haiku 4.5 | "Structured CF API output and build-log scan β Haiku handles deterministic formatting well." |
| DB integrity | one-shot | Haiku 4.5 | "Row-count delta interpretation; arithmetic + threshold judgment fits Haiku." |
| Edge function health | one-shot | Haiku 4.5 | "Status-code summarization; smoke-test pass/fail aggregation." |
| Security passive | agent-loop | Sonnet 4.6 | "Diff-driven security investigation needs multi-turn reasoning + targeted tool calls; agent-loop on Sonnet is the right shape." |
| Security active | one-shot | Sonnet 4.6 | "Probes are deterministic; reasoning is interpreting unexpected results. One-shot avoids agent-loop overhead." |
| Test coverage | agent-loop | Sonnet 4.6 | "Inventorying surfaces and ranking criticality is a navigation problem; agent-loop with file-reading tools fits." |
| LLM-on-merge (coverage extension) | one-shot | Haiku 4.5 | "Diff β manifest entry is structured-output; Haiku is fast + cheap." |
Both mode and model are mutable per agent from the UI. Mode changes are visible in the audit log alongside model changes. The rationale field surfaces in the activity log and run drill-down so anyone reviewing a run knows why this configuration was used at the time.
Migration intent: once we have ~4 weeks of steady-state runs + cost data, evaluate which agents could downshift to a local Ollama model on alpu.ca without quality loss. The Agent SDK supports custom model providers, so the agent-loop path will accept Ollama too.
7. Orchestrator on alpu.ca
Lives at infra/alpu/sentinel-orchestrator/ (committed; deployed via rsync + systemd). Stack: Node 20 + TypeScript, runs as sentinel.service.
Schedule
Single sweep nightly inside the 10pmβ8am CST window. Cron at 0 22 * * * America/Chicago kicks off the orchestrator; agents run sequentially in default order (cheapest/fastest first, most-expensive last):
Whole sweep typically finishes by ~03:30 CT. Each agent records start/end so the activity log shows actual cadence; sequencing is config-driven.
Entry points
cronβ system cron at 22:00 CTwebhooklistener onlocalhost:8765for manual "run now" requests forwarded by a Cloudflare Tunnel; HMAC-signed (shared secret in~/.config/sentinel/.env)heartbeattask every 15min β POSTs to a Supabase function that updatesmonitoring_heartbeats
Run flow per agent
- Insert
monitoring_runsrow (status=running) - Gather manifest entries; execute probes; collect outputs + artifacts
- Compute diff since last successful run for this agent (commits via
git log, file changes) - Call reasoner via configured adapter; persist findings (with dedup logic)
- Update
monitoring_runs(status, cost, summary)
Auth & secrets. Bitwarden CLI (bw) unlocks at service start with a session token; service-role Supabase key + Anthropic key + R2 keys stored in ~/.config/sentinel/.env. Bootstrap recipe added to infra/runbook.md.
8. Coverage extension β crawler + LLM-on-merge
Crawler (runs at the start of each nightly sweep)
- Walks
apps/garden/**andapps/control/src/app/**/page.tsxto enumerate routes - Walks
apps/control/supabase/functions/for edge fns - Queries
information_schema.tablesfiltered topublicschema - Reads
apps/*/package.jsonfor dependency lists - For each newly-seen target, inserts a
monitoring_manifestrow withsource='crawler',approval_status='active', and a default probe config
LLM-on-merge (GitHub Action)
- Triggers on
push: branches: [main] - Reads commit range + diff
- Calls Anthropic API (Haiku) with the diff + current manifest summary
- Posts proposed manifest entries to a Supabase edge function (
sentinel-llm-merge) which writes them tomonitoring_manifestwithapproval_status='pending_approval' - Admin sees them in
Sentinel β Coverage β Pendingand approves/rejects
Crawler = always-on default coverage; LLM = high-quality additions when code lands. Coverage grows automatically while the human stays in the loop on what gets monitored.
9. Frontend β Sentinel intranet section
This is where someone unfamiliar with the system has to be able to land and orient. The frontend carries the entire weight of the "dashboard only" stance.
9.1 Information architecture
Detail views are right-side drawers (mirroring the existing task-detail-drawer.tsx pattern). This keeps the static-export build workable β tab pages are pre-rendered, drawer content is fetched client-side from Supabase. Drawer state is stored in URL hash so detail views are linkable: /en/sentinel/runs#run=abc123.
9.2 Tab-by-tab spec
Overview
- Hero status card: pill (green/yellow/red), last sweep timestamp, "X agents passed / Y failed last night"
- Open critical + high findings count + a quick-list (top 3 with severity, title, surface)
- Today's activity excerpt (last 10 entries) with "View all" link
- Quick-link tiles to: All findings Β· All runs Β· Coverage matrix Β· Pending coverage approvals
- Polling: every 15s
Coverage
The "at-a-glance" surface map.
- Filters above the matrix: surface kind (all / routes / tables / edge fns / packages / services), agent (multi-select), status (all / has-failure / no-coverage)
- Search box for filtering by surface name
- Matrix table:
- Rows: surfaces grouped by kind, with kind heading rows (sticky)
- Columns: each agent that can cover this surface kind
- Cells: small status pill (pass = green dot, fail = red, warn = amber, never run = gray, not applicable = empty)
- Cell click β popover with last run for that probe + button "Open run drawer"
- "Surfaces without coverage" callout at the top if any surface has no agent covering it
- Sub-section below the matrix: Pending coverage approvals β manifest rows with
approval_status='pending_approval'from LLM-on-merge, with Approve / Reject buttons inline
Agents
- Card grid (3 cols desktop, 1 mobile)
- Each card: agent name + surface domain icon, status pill, last run (relative time), open findings badge, 30-run sparkline (plain SVG), model chip with rationale tooltip, schedule, "Run now" button (HMAC-signed POST β orchestrator)
- Click card β opens Agent drawer with tabs:
- About β markdown rendered from
apps/control/src/components/sentinel/docs/{slug}.md. Covers purpose, surfaces covered, probes run, severity rules, default model + rationale, known limitations. - Probes β manifest entries this agent owns
- Prompts β list of prompt templates with version history; click β render full prompt text + variable list
- Runs β last 30 runs with link to run drawer
- Findings β currently open findings for this agent
- Settings β model + rationale (editable), cost ceiling, timeout, schedule. Edits go to
monitoring_audit_log.
- About β markdown rendered from
Activity
- Chronological feed, newest at top
- Each row: timestamp, event icon (run/finding/manifest/audit/heartbeat), actor (system / cron / user), summary, model chip + cost chip if applicable, "Open" button to relevant drawer
- Filters: event type, actor, agent, date range
- Each entry expandable to show details inline
- Live append: new entries fade in at top via 15s polling
- "Load more" pagination (50 at a time)
Findings
- Filters: severity (multi), status (open/ack/dismissed/resolved), agent (multi), surface (multi), recurrence count (β₯1, β₯5, β₯30)
- Saved filter "presets": "Open critical/high", "Recurring (β₯5)", "Acknowledged but not resolved", "Test coverage drafts"
- Table columns: severity badge, title, surface, agent, occurrences, last seen, status
- Sort: severity desc (default), last seen desc, recurrence desc, age asc
- Click row β opens Finding drawer
- Bulk actions toolbar: select multiple β Acknowledge / Dismiss with shared note
Finding drawer:
- Header: severity badge (large), title, status pill, surface, agent
- Description (full reasoner text)
- Recommended action (bordered box, highlighted)
- Occurrence history: list of run links with timestamps; sparkline of severity over time
- Related artifacts (collapsible viewers)
- For test-coverage findings: Approve & queue primary button + draft task preview (target file, suggested framework, prompt, scaffold). Clicking opens a confirmation modal; on confirm, inserts into
taskstable and shows the task link. - Actions footer: Acknowledge (with note), Dismiss (with reason), Resolve, View related runs
- Audit log of actions on this finding shown at bottom
Runs
- Filters: agent, trigger (cron/manual/deadman), status, date range
- Table columns: started_at, agent, trigger, status, duration, cost, models used, finding count
- Click row β opens Run drawer β the multi-step transparency surface
Run drawer:
- Header: agent name, execution-mode chip (
one-shotoragent-loop), trigger + actor, status pill, total cost, total duration, models used (chips), turn count (agent-loop only) - Stepper (vertical timeline): each step rendered as a card showing index + name, kind chip (
probe/reasoner/agent_turn/tool_call/aggregate), model chip with rationale tooltip, duration + cost, status pill - For agent-loop runs, the stepper visually groups consecutive
agent_turn+ childtool_callsteps so it's obvious which turn invoked which tool. Tool-call cards show tool name + arguments (collapsed) and the result (collapsed). - Card expands inline to reveal: Prompt (template name + version + collapsible rendered text β for agent-loop, this is the system prompt for the first turn and the running message history for subsequent turns), Input context (collapsible jsonb tree), Output (collapsible jsonb tree), Artifacts produced + artifacts consumed with handoff arrows
- Findings raised in this run (sidebar or bottom strip) with quick-jump to Finding drawer
Docs
System-level documentation hub. Designed for a new collaborator to onboard cold. Three subsections:
- Overview β renders
infra/sentinel/README.md - Per-agent docs β list of seven agent docs with click-to-render
- Glossary β renders
infra/sentinel/glossary.md(terms: agent, probe, finding, dedup key, manifest, etc.)
All rendered with react-markdown; no MDX runtime needed.
9.3 Component catalog
Lives in apps/control/src/components/sentinel/. New shared components:
| Component | Purpose |
|---|---|
<SeverityBadge severity> | Colored pill with icon: critical / high / medium / low / info |
<StatusPill status variant> | Green/red/amber/gray dot with label. Variants: run probe agent manifest |
<RunSummaryCard run> | Card for runs list / agent's recent runs |
<RunStepper run steps artifacts> | Vertical step timeline with handoff arrows |
<PromptViewer template renderedText variables> | Collapsible prompt with version-history link |
<ArtifactViewer artifact> | Auto-routes by kind: image (R2 fetch), HTTP body (syntax highlight), JSON tree, screenshot, stack trace, diff, scaffold |
<JsonTree data collapsedDepth> | Expandable jsonb viewer for input/output |
<CoverageMatrix surfaces agents results> | CSS-grid surfaceΓagent matrix with cell popovers |
<AgentCard agent> | Card with sparkline, model chip, "Run now" |
<FindingsTable findings filterable bulkActionable> | Findings list with bulk actions |
<ActivityEntry event> | One feed row, expandable |
<MarkdownDoc source> | react-markdown wrapper with project styling |
<SeverityHeatmap occurrences> | Tiny inline heatmap for finding history |
<CostBadge cost> | USD pill ($0.0008 / $1.20) |
<ModelChip provider model rationale> | Provider icon + model name; hover tooltip = rationale |
<Sparkline data status> | 30-point pass/fail SVG (reuses devcontrol/context-tab.tsx technique) |
<DrawerShell title onClose> | Right-side drawer wrapper; hash-state aware |
9.4 Data layer
- Single composable hook
useSentinel()inapps/control/src/hooks/use-sentinel.ts - Sub-hooks:
useAgents(),useRun(runId),useRuns({ filter }),useFindings({ filter }),useFinding(id),useCoverage(),useActivity({ filter, page }),useManifest({ status }) - All read-side hooks share a small in-memory cache (15s TTL); polling tabs (Overview/Activity) refetch on interval; detail drawers refetch on open
- Mutations go through
useSentinelMutations():acknowledgeFinding,dismissFinding,resolveFinding,approveManifestEntry,rejectManifestEntry,updateAgentSettings,triggerAgentRun(HMAC-signed POST),approveAndQueueTask(creates a row intasksfrom a finding'smetadata.draft_task_payload) - Every mutation invalidates relevant cached queries
9.5 Loading / empty / error
- Skeleton placeholders for all card grids and tables (one row of pulse-animated bars)
- Empty: "No findings open" β emerald-tinted card with green checkmark; "No agents have run yet" β guidance to trigger first run; "No coverage for this surface" β call-to-action to open a manifest manual entry
- Error: red-bordered card with error message + Retry button + "Report issue" link
9.6 Visual design
- Inherits intranet design: slate-on-white, Tailwind 4, no shadcn
- Severity colors: critical=
rose-700, high=red-600, medium=amber-600, low=blue-500, info=slate-400 - Status colors: pass=
emerald-500, fail=rose-600, warn=amber-500, skip/never=slate-300, paused=slate-400 - Surface accent colors (thin left border on agent cards / matrix column headers): uptime=
blue, deploy=purple, db=teal, edge-fn=indigo, security-passive=amber, security-active=rose, test-coverage=emerald - Typography: Inter for body, JetBrains Mono for code/IDs
- Spacing: cards 5-unit padding; section headings 6-unit margin; matrix cells dense (1.5-unit padding)
9.7 Accessibility
- Keyboard navigable: all rows/cards focusable; arrow keys navigate matrix; Enter to open drawer; Esc to close drawer
- Screen reader friendly: severity badges have
aria-label; status pills include text in addition to color - Color is never the only signal: every status/severity pairs color with text or icon
9.8 Routing
Static-export-compatible:
/[lang]/sentinel/[tab]/page.tsxfor the seven tab pages β uses<TabContent section="sentinel" />like existing sections- All detail views are right-side drawers; drawer state is in URL hash (e.g.
#run=abc-123,#agent=uptime,#finding=xyz) so they're linkable + bookmarkable - Drawers fetch their data client-side on open
9.9 Auth gating
- Wrap section in
<AdminGuard>(new fileapps/control/src/components/auth/admin-guard.tsx) checkingappUser.role IN ('oracle','admin') - Demo-role users see the section hidden from nav and a 403 if they navigate by URL
<AdminGuard>is created in this work but not retrofitted onto the existingadminsection β separate decision
10. Notifications: daily morning digest
A single Cloudflare Worker (apps/control/worker/sentinel-digest/) on a 7am CT cron. Queries last 24h of runs/findings/audit log via Supabase service role. Renders an HTML email via Resend.
Recipients: app_users with role IN ('oracle','admin').
Content:
- Headline: green/yellow/red status + count of failed agents
- New high/critical findings (title, surface, recommended action β link to Finding drawer)
- Recurring findings whose severity bumped overnight
- Cost summary: total, by agent
- Pending coverage approvals count
- Failed runs (if any) with link
- Footer: link to Sentinel dashboard
11. Files to create / modify
Create
apps/control/migrations/20260506_sentinel_schema.sqlβ schema (allmonitoring_*tables)apps/control/migrations/20260506_sentinel_rls.sqlβ RLS policies + admin-only accessapps/control/migrations/20260506_sentinel_seed_prompts.sqlβ seedmonitoring_prompt_templatesapps/control/src/lib/sentinel/probes/{uptime,deploy,db-integrity,edge-fn,security-passive,security-active,test-coverage}.tsapps/control/src/lib/sentinel/reasoner/oneshot.tsβ direct Anthropic API path (used by deterministic agents)apps/control/src/lib/sentinel/reasoner/agent-loop.tsβ Claude Agent SDK runner (used by Test coverage + Security passive)apps/control/src/lib/sentinel/reasoner/step-recorder.tsβ common step persistence; subscribes to SDK events for agent-loop, called manually for one-shotapps/control/src/lib/sentinel/tools/index.tsβ tool registryapps/control/src/lib/sentinel/tools/{http-probe,query-supabase,read-file,git-diff,run-gitleaks,query-osv,inventory,find-existing-tests,draft-test-scaffold,cf-pages,r2-list,record-finding}.tsβ individual tool implementationsapps/control/src/lib/sentinel/manifest.tsβ crawler logicapps/control/src/lib/sentinel/artifacts.tsβ inline-vs-R2 storage helperapps/control/src/components/sentinel/{overview,coverage,agents,activity,findings,runs,docs}/β one folder per tabapps/control/src/components/sentinel/shared/β shared componentsapps/control/src/components/sentinel/docs/{uptime,deploy,db-integrity,edge-fn,security-passive,security-active,test-coverage}.mdβ per-agent in-UI docsapps/control/src/components/sentinel/drawers/{agent,run,finding,probe}-drawer.tsxapps/control/src/components/auth/admin-guard.tsxapps/control/src/hooks/use-sentinel.tsapps/control/src/hooks/use-sentinel-mutations.tsapps/control/src/app/[lang]/(intranet)/sentinel/[tab]/page.tsxapps/control/worker/sentinel-deadman/β CF Worker dead-man switchapps/control/worker/sentinel-trigger/β receives "Run now" UI clicks, HMAC-signs and POSTs to alpu.ca via Tunnelapps/control/worker/sentinel-digest/β daily morning digest at 7am CT via Resendapps/control/supabase/functions/sentinel-llm-merge/β edge function called by GitHub Actioninfra/alpu/sentinel-orchestrator/β Node service + systemd unit + READMEinfra/sentinel/README.mdβ system-level documentationinfra/sentinel/glossary.md.github/workflows/sentinel-coverage.ymlβ LLM-on-merge coverage extensionvitest.config.ts+ Playwright config + sample tests per app β bootstrap delivered as a prompt-runner task in Phase 0
Modify
apps/control/src/types/intranet.tsβ addsentinelsection + tabsapps/control/package.jsonβ addreact-markdown(UI),@anthropic-ai/agent-sdk(server-side, only used by orchestrator imports),zodfor tool schemas; Vitest + Playwright added by the Phase 0 bootstrap taskapps/garden/package.jsonβ add Vitest + Playwright via bootstrap taskinfra/runbook.mdβ Sentinel orchestrator bootstrap section + alpu.ca recipe + Bitwarden recipeCLAUDE.md(root) β short "Monitoring" section pointing to this systemapps/control/scripts/lint-image-gen.shβ extend whitelist for sentinel probes if they touch the image catalog
12. Build sequence β phased & sequenced
Five phases. Each item is a discrete merge-able PR. Items within a phase are mostly parallelizable; items across phases have dependencies (called out where they cross).
Phase 0 β Prerequisites (~2 days)
Goal: unblock everything else. All items must complete before Phase 2's test-coverage agent can produce useful findings.
| # | Item | Notes |
|---|---|---|
| 0.1 | Create alpu.ca user account + ssh key + Bitwarden secret recipe | Documented in infra/runbook.md |
| 0.2 | Reserve Cloudflare Tunnel hostname for alpu.ca β control webhook | Used by sentinel-trigger Worker in Phase 1 |
| 0.3 | Generate HMAC shared secret; add to ~/.config/sentinel/.env on alpu.ca + as a Cloudflare Worker secret | |
| 0.4 | Seed task in tasks table: "Bootstrap Vitest + Playwright + sample tests in apps/garden + apps/control + CI hooks" | Manually inserted via SQL. Headless prompt-runner picks it up; opens PR. Admin reviews + merges. Dogfood moment. |
| 0.5 | Confirm Bitwarden CLI install + unlock-on-boot recipe on alpu.ca |
Exit criteria: alpu.ca reachable; tunnel up; HMAC secret in place; Vitest + Playwright PR merged with at least one passing test per app; CI runs them.
Phase 1 β Foundation (~1 week, depends on Phase 0)
Goal: a single agent runs end-to-end with full step transparency.
| # | Item | Depends on |
|---|---|---|
| 1.1 | Migration: monitoring_* schema (all tables in Β§5, including execution_mode + tool/turn fields on monitoring_agents) | 0.1 |
| 1.2 | Migration: RLS policies (admin-only) | 1.1 |
| 1.3 | Migration: seed monitoring_prompt_templates (placeholders; refined in Phase 2) | 1.1 |
| 1.4 | Add <AdminGuard> component | β |
| 1.5 | One-shot reasoner (reasoner/oneshot.ts) with Anthropic adapter + mock adapter | β |
| 1.6 | Artifact storage helper (inline vs R2 routing) | 1.1 |
| 1.7 | Step recorder (writes monitoring_run_steps + artifacts during execution; same module used by both modes) | 1.5, 1.6 |
| 1.8 | Probes: Uptime + Deploy verifier (deterministic, both produce artifacts on failure) | 1.6 |
| 1.9 | Orchestrator skeleton on alpu.ca: cron + heartbeat + HMAC webhook + per-step persistence + per-agent timeout + cost ceiling. Dispatches by execution_mode (only one-shot wired in this phase). | 0.2, 0.3, 1.7, 1.8 |
| 1.10 | sentinel-trigger Worker (HMAC-signs UI clicks β alpu.ca tunnel) | 0.2, 0.3 |
| 1.11 | Add sentinel section to intranet.ts + route shell + admin gating | 1.4 |
| 1.12 | Sentinel UI: Overview tab (skeleton) + Runs tab + Run drawer with stepper (renders probe + reasoner step kinds; agent_turn + tool_call rendering arrives in Phase 3) | 1.9, 1.11 |
| 1.13 | Sentinel UI: Agents tab (cards) + manual "Run now" wiring | 1.10, 1.12 |
| 1.14 | Shared components: SeverityBadge, StatusPill, ModelChip, CostBadge, JsonTree, PromptViewer, ArtifactViewer (image + http_response + json), DrawerShell, Sparkline | 1.11 |
Exit criteria: A nightly cron run on alpu.ca produces monitoring_runs + monitoring_run_steps rows for Uptime and Deploy verifier. UI Run drawer shows every step with model + rationale + prompt + input + output + artifacts. Manual "Run now" works from UI.
Phase 2 β Full agent set + Coverage view (~1.5 weeks, depends on Phase 1)
Goal: all 6 non-test agents firing nightly; Coverage matrix lit up; severity bumps and audit log working.
| # | Item | Depends on |
|---|---|---|
| 2.1 | Probes (deterministic, also exposed as tools): DB integrity (row counts, R2 reconciliation, pg_cron freshness, migrations applied) | 1.7 |
| 2.2 | Probes: Edge function health (incl. existing smoke-image-gen.mjs as a probe) | 1.7 |
| 2.3 | Probes: Security active (RLS bypass attempts, public R2 inventory, unauth edge-fn probes) | 1.7 |
| 2.4 | Tool registry (tools/index.ts) + tool implementations: http_probe, query_supabase, read_file, git_diff_since, cf_pages_status, r2_list, record_finding. Each tool has a Zod input schema and shares code with the probe library. | 1.7 |
| 2.5 | Refine monitoring_prompt_templates per agent (versioned; v1 production prompts including system prompts for the future agent-loop agents) | 1.3 |
| 2.6 | Findings dedup logic (orchestrator-side: increment occurrence vs new row; also called from the record_finding tool) | 1.7 |
| 2.7 | Regression-aware severity bumps (bump after 30 consecutive passes; performance regression on 3Γ baseline) | 2.6 |
| 2.8 | Audit log emission for ack/dismiss/resolve, model change, execution-mode change, agent pause, manifest approval | 1.1 |
| 2.9 | Per-agent cost ceiling + timeout enforcement + auto-pause-with-finding (applies to both modes) | 1.9 |
| 2.10 | Sentinel UI: Findings tab + Finding drawer + bulk-action toolbar | 1.12 |
| 2.11 | Sentinel UI: Activity tab (chronological feed, filters, polling) | 1.12 |
| 2.12 | Sentinel UI: Coverage tab (matrix view + filters + Pending approvals section) | 2.10 |
| 2.13 | Per-agent in-UI docs: write six sentinel/docs/*.md files (Test coverage lands with that agent in Phase 3); render in Agent drawer About tab | 2.10 |
| 2.14 | Shared components: CoverageMatrix, FindingsTable, ActivityEntry, AgentCard, SeverityHeatmap, MarkdownDoc | 1.14 |
Exit criteria: Five of seven agents fire nightly in one-shot mode (Uptime, Deploy verifier, DB integrity, Edge function health, Security active). Findings dedupe correctly. Severity bumps after 30 successful runs. Coverage matrix is populated. Audit log captures admin actions. Cost ceiling auto-pauses an agent on test. Tool registry exists and is consumable.
Phase 3 β Agent-loop path + Coverage extension + Test agent + Notifications (~1.5 weeks, depends on Phase 2)
Goal: Claude Agent SDK wired into the orchestrator; the two agent-loop agents (Security passive, Test coverage) running; coverage auto-grows; daily digest goes out.
| # | Item | Depends on |
|---|---|---|
| 3.1 | Add @anthropic-ai/agent-sdk dependency; build reasoner/agent-loop.ts runner that initializes an SDK session per run with system prompt, tool catalog, model, max-turns; subscribes to step events and forwards to the step recorder | 1.7, 2.4, 2.5 |
| 3.2 | Orchestrator dispatcher: route runs by execution_mode; both modes fully wired | 1.9, 3.1 |
| 3.3 | Run drawer rendering: agent_turn + tool_call step kinds; visual grouping of turns and their tool calls; tool name/args/result display | 1.12, 3.1 |
| 3.4 | Agent-loop tool implementations specific to advanced agents: run_gitleaks, query_osv, list_routes, list_edge_fns, list_critical_files, find_existing_tests, draft_test_scaffold | 2.4 |
| 3.5 | Wire Security passive agent (agent-loop, Sonnet 4.6); write its system prompt; smoke-test on a known-safe diff | 3.1, 3.4 |
| 3.6 | Wire Test coverage agent (agent-loop, Sonnet 4.6); write its system prompt; smoke-test against the Vitest+Playwright setup from Phase 0; write Test-coverage in-UI doc | 3.1, 3.4, 0.4 |
| 3.7 | Finding drawer: Approve & queue button + draft task preview + confirmation modal | 2.10 |
| 3.8 | approveAndQueueTask mutation (inserts row into existing tasks table from finding metadata) | 3.7 |
| 3.9 | Crawler logic + manifest auto-population at start of each sweep | 1.9, 2.12 |
| 3.10 | LLM-on-merge GitHub Action (.github/workflows/sentinel-coverage.yml) | 3.9 |
| 3.11 | Supabase edge function sentinel-llm-merge (signature verify; writes pending rows) | 3.10 |
| 3.12 | Coverage tab: Pending approvals UI (Approve / Reject buttons inline) | 2.12, 3.11 |
| 3.13 | Dead-man-switch CF Worker (sentinel-deadman) β flags finding if no heartbeat in 25h | 1.9 |
| 3.14 | Daily digest CF Worker (sentinel-digest) β 7am CT cron β Resend | 2.10 |
Exit criteria: A new route added in a feature branch shows up in monitoring_manifest after merge. Test coverage agent flags a sample uncovered file with a recommended action and a draft task; clicking Approve & queue creates a tasks row that the prompt-runner picks up. Security passive agent runs against a sample diff and emits findings via tool calls. Daily digest email arrives at 7am CT. Dead-man flagged in <25h when service stopped.
Phase 4 β Documentation & polish (~3 days, depends on Phase 3)
Goal: an unfamiliar collaborator can land in the UI and onboard cold.
| # | Item | Depends on |
|---|---|---|
| 4.1 | Write infra/sentinel/README.md (system-level "how Sentinel works") | All prior |
| 4.2 | Write infra/sentinel/glossary.md | 4.1 |
| 4.3 | Sentinel UI: Docs tab β render README + per-agent docs + glossary | 2.10, 2.13, 4.1, 4.2 |
| 4.4 | Update infra/runbook.md with Sentinel orchestrator ops recipe | 1.9 |
| 4.5 | Update root CLAUDE.md with Monitoring section | 4.1 |
| 4.6 | Update apps/control/scripts/lint-image-gen.sh whitelist for sentinel probes | 2.2 |
| 4.7 | Visual polish pass on all tabs (skeleton, empty, error states) | 2.12 |
| 4.8 | Accessibility audit (keyboard nav, screen reader, focus) | 4.7 |
| 4.9 | Cost-and-performance review of the first week of runs; tune defaults | post-launch |
Exit criteria: Docs tab renders cleanly. Visual polish complete. Accessibility audit passes. Runbook updated.
Total estimate
Phases 0β4 β 4β4.5 weeks of focused work (slightly longer than the pre-SDK plan because Phase 3 now also covers the agent-loop runtime + tool catalog). Mostly sequential because the orchestrator + UI + agents stack on each other. Phase 0 should start immediately so the prompt-runner has time to deliver Vitest/Playwright bootstrap before Phase 3.
13. Verification β end-to-end checklist
- Manual UI trigger. Click "Run now" on the Uptime agent in
Sentinel β Agents. Run appears inActivitywithin 30s. Run drawer shows every step with model, rationale, prompt, input context, output, artifacts. - Nightly cron.
journalctl -u sentinel.service -fon alpu.ca. Confirm 22:00 CT kickoff, sequential agent execution, full sweep complete by ~03:30 CT, exit 0. - Coverage matrix. Open
Sentinel β Coverage. Confirm every garden + control route, every critical Supabase table, every edge function, and every external service is listed. Each row shows which agents cover it. - Dedup. Introduce a deliberate failure (e.g. delete a route temporarily). Run twice. Confirm finding has
occurrence_count=2, not two rows. - Regression severity. Simulate 30 consecutive passes on a probe, then a fail; confirm severity bumps one level vs a fresh-fail finding.
- Coverage extension. Create a branch with a new route + new edge function, merge to main. Within 2 min, GitHub Action posts manifest entries. Approve in UI. Next nightly run probes them.
- Dead-man. Stop
sentinel.serviceon alpu.ca. Within 25h, the dead-man Worker writes a finding withagent='deadman'. Restart service; next heartbeat resolves the finding. - Model swap with audit. Change Uptime agent's
model_idfrom Haiku to Sonnet in the UI; provide a rationale. Confirmmonitoring_audit_loghas the change with before/after. Next run uses Sonnet and cost telemetry reflects it. - Cost ceiling. Lower a test agent's
daily_cost_ceiling_usdto $0.001 and trigger a run. Confirm the agent auto-pauses and a finding is generated. - Daily digest. Confirm 7am CT email arrives at oracle/admin addresses with summary of last 24h.
- Docs tab. Open
Sentinel β Docs. Confirm README renders; each agent's doc renders; glossary visible. - Multi-step transparency (agent-loop). Open a Test coverage or Security passive run. Confirm the stepper shows alternating
agent_turn+tool_callsteps, with each tool call's name, arguments, and result visible. The execution-mode chip on the run header readsagent-loop. - Multi-step transparency (one-shot). Open an Uptime run. Confirm the stepper shows a flat list of probe steps followed by a single
reasonerstep. Execution-mode chip readsone-shot. - Approve & queue. Approve a test-write finding. Confirm a row appears in
taskswith the draft payload; confirm the headless prompt-runner picks it up. - RLS. Sign in as a
demo-role user ([email protected]). Confirm Sentinel tab returns 403 / hidden from nav. - Webhook auth. POST to the alpu.ca trigger endpoint without a valid HMAC signature; confirm rejection.
14. Out of scope for v1
- Auto-remediation (agents fixing things). Recommendations only.
- Public status page.
- Real-user metrics, Core Web Vitals, performance monitoring.
- Distributed tracing / OpenTelemetry.
- PagerDuty / on-call escalation.
- Slack notifications. (Daily morning email digest IS in v1; Slack is v1.5.)
- Multi-agent coordination / debate.
- Replacing CI's existing checks. CI stays; Sentinel sits above it.
- Auto-creation of GitHub issues from findings. (v1.5 candidate.)
- Pre-commit / pre-push hooks.
- Local-model (Ollama) reasoner adapter. Deferred to v1.1 evaluation pass once steady-state cost data is available.
- Trend analysis / week-over-week reports. (Data is captured; analytics view is v1.5.)
- Self-meta-monitoring (Sentinel observing Sentinel). Daily digest covers the basics.
- Ingesting external service status feeds (CF, Supabase status pages). v1.5.
- Retrofitting
<AdminGuard>onto the existingadminsection. - Browser-based smoke tests via Playwright in Uptime. Revisit in v1.5 once Playwright is bootstrapped.
- Slow-query monitoring via
pg_stat_statements.
15. Open questions to revisit during build
- Heartbeat frequency. 15min is a first guess. If monitoring runs ever need >15min uninterrupted, raise to 30min and bump dead-man window to 90min.
- Manifest approval UX. The
pending_approvalflow can become tedious. If approval rate is ~100% in the first month, switch the LLM-on-merge default toactiveand reservepending_approvalfor security-touching surfaces. - Test framework on a static-export Next.js setup. Confirm Vitest + Playwright play nicely with the
next-intl-wrapped App Router. May need a small dance with the[lang]segment in test fixtures. Surface in Phase 0 task prompt. - Sequential agent timing. The 8-step schedule assumes typical run durations; if Security passive or Test coverage routinely overruns, reorder or parallelize. Schedule is config-driven, so tunable.
- R2 cost. Artifact storage in R2 is cheap but not free. Set a per-agent artifact size cap; large bodies get truncated with a "view full body via R2" link.
- Local model picks for v1.1. Validate Qwen2.5 vs Llama 3.3 on alpu.ca with a representative probe-output sample before committing in code.
Doc owner: Haydn. Drafted 2026-05-06. Markdown source: docs/devtasks/sentinel.md. Operational secrets in BW folder devops-sponic.