← All docs

Sentinel

An automated testing & monitoring system that verifies every Sponic surface nightly, auto-extends coverage as new dev work lands, and surfaces activity, agents, and findings in a dedicated intranet section.

2026-05-06 Build plan / RFC Revised: Claude Agent SDK ~4–4.5 weeks Touches: Supabase · alpu.ca · Cloudflare · Anthropic Agent SDK · GitHub Actions · prompt-runner

How to read this doc

Sections 1–3 frame the goal and architecture. Section 4 lists the seven agents. Section 5 is the data model. Sections 6–8 cover models, the orchestrator, and how coverage auto-grows. Section 9 is the frontend specification — read this if you're touching UI. Sections 11–12 are the implementation map: files to create + a phased build sequence. Section 13 is the end-to-end verification checklist. Markdown source: docs/devtasks/sentinel.md.

Context
Architecture — four layers
Multi-step transparency
v1 agent set (7 agents)
Data model — Supabase schema
Reasoner & model strategy
Orchestrator on alpu.ca
Coverage extension — crawler + LLM-on-merge
Frontend — Sentinel intranet section
Notifications: daily digest
Files to create / modify
Build sequence — phased & sequenced
Verification — end-to-end checklist
Out of scope for v1
Open questions

1. Context

The Sponic monorepo has grown a wide surface — two deployed apps (apps/garden, apps/control), Supabase Postgres with 50+ migrations and ~15 critical tables, three pg_cron jobs, two Cloudflare Workers, ~15 external services, a headless claude -p runner on Oracle Phoenix — with near-zero automated verification. CI runs only tsc + eslint + build + image-gen lint. There are no unit or e2e tests, no health endpoints, no uptime monitoring, no Sentry/Datadog, no dependency scanning, no pre-commit hooks. The only smoke test is a manual smoke-image-gen.mjs.

The tasks system already has a queue + headless claude -p runner + cost tracking + activity log on Oracle Phoenix. Sentinel reuses concepts (queueing, persisted runs, cost telemetry) but runs on a separate orchestrator on alpu.ca so a Sentinel failure can't break the tasks UI and vice versa.

Goal

Ship a system that:

Runs automated checks across every surface daily.
Auto-extends coverage as new dev work lands (without manual setup per feature).
Presents activity, agents, findings, coverage in a dedicated intranet section called Sentinel, with first-class transparency into every multi-step run (prompts, models, handoff artifacts, costs).
Generates structured recommendations without auto-acting on them. (Test-coverage findings can be approved & queued to the prompt-runner via a manual gate.)

2. Architecture — four layers

Probes — deterministic checks. Cheap, scriptable, structured-output. Live in apps/control/src/lib/sentinel/probes/. Each probe is a TS function (ctx) => ProbeResult. Probes can attach artifacts (HTTP response bodies, screenshots, stack traces, log excerpts) which are stored either inline or in R2. Each probe is also exposed to the Agent SDK as a tool — so probes serve double duty: directly callable in deterministic agents, and tool-callable by LLM-driven agents.
Reasoner — LLM execution layer with two modes:
- One-shot (reasoner/oneshot.ts): direct Anthropic API call with structured output schema. Used by deterministic agents that just need a brief summary + verdict over already-collected probe results. Cheap, fast, no tool calling.
- Agent-loop (reasoner/agent-loop.ts): Claude Agent SDK initialized with a custom tool catalog (the probes from layer 1, plus a few orchestrator-supplied tools like record_finding, query_supabase, read_file). Used by agents that need to navigate the codebase or chain probes based on what they find — Test coverage and Security passive primarily.
- Per-agent config selects which mode + model. The mode is stored on the agent record and locked in at run time.
Orchestrator — systemd service on alpu.ca. Triggers from cron + an HMAC-signed webhook for manual UI runs. Reads the agent registry from Supabase, dispatches each run to the configured execution mode, collects step events emitted by the SDK (or by the one-shot path), persists every step (prompts, context, output, cost, tool calls, model rationale) to monitoring_run_steps. Source: infra/alpu/sentinel-orchestrator/.
UI — new intranet section Sentinel in apps/control. Seven tabs: Overview, Coverage, Agents, Activity, Findings, Runs, Docs. Detailed in §9.

Layers are decoupled: probes are TS functions usable from either execution mode; the orchestrator can be moved (e.g. to Oracle Phoenix later) without touching probes/UI; the SDK version can be upgraded without touching probes or UI.

Why the Agent SDK and not Anthropic Managed Agents?

We considered hosted Managed Agents and rejected for v1 because (a) deterministic probes shouldn't pay token costs to wrap answers we already have, (b) active security probes need a stable origin so we don't trip our own WAF, (c) gitleaks/OSV/diff-readers need direct repo access, (d) the SDK preserves our v1.1 path to migrate select agents to a local Ollama model on alpu.ca. Using the SDK inside our own orchestrator gets the agentic-loop ergonomics without those tradeoffs.

3. Multi-step transparency

Every agent run is decomposed into discrete steps that are individually persisted. A step is anything with a clear input → output and an attributable model/cost. Examples:

Probe execution (step_kind = probe) — input = probe config; output = ProbeResult + artifacts. Used by both modes.
One-shot reasoner call (step_kind = reasoner) — input = collected probe outputs + diff context; output = structured findings. Used only by one-shot agents.
Agent-loop turn (step_kind = agent_turn) — one round of the SDK loop: model produced a message and optionally invoked tools. Output includes model text + tool-use intents.
Tool call (step_kind = tool_call) — input = tool name + arguments; output = tool result. Emitted by the SDK whenever the agent calls one of the probes/utilities in its catalog.
Aggregation (step_kind = aggregate) — combining outputs across steps; rare, used for orchestrator-level deduping.

For agent-loop runs, the SDK emits per-turn and per-tool-call events natively; the orchestrator subscribes to those and writes them straight into monitoring_run_steps. We do not hand-roll the multi-step machinery — we adopt it.

Each step records: kind, model used (for steps where a model was invoked), model rationale (locked in at run time from the agent config), the prompt template id + rendered prompt text or system prompt, the full input context (jsonb), the output (jsonb), tool name + arguments + result for tool calls, references to artifacts produced, references to artifacts consumed (handoff lineage), duration, tokens, cost, status.

In the UI, clicking into any run shows a stepper with all of this. For agent-loop runs, the stepper renders as alternating agent turns and tool calls with arrows showing which turn invoked which tool. For one-shot runs, it's a flat list of probes followed by a single reasoner step. Both surface identically in the UI; the underlying mode is just a chip on the run header.

4. v1 agent set (7 agents)

Each agent owns a domain, has its own model config + execution mode, and registers its probes in monitoring_manifest. Mode = how the orchestrator runs it: one-shot (collect probe results, summarize once) or agent-loop (Claude Agent SDK with tool catalog, iterates).

#	Agent	Mode	Probes / behavior
1	Uptime	`one-shot`	HTTP 200 on every garden + control route (auto-discovered from `apps//src/app/*/page.tsx`); HTTPS cert expiry on both domains; `claude-sessions.sponicgarden.workers.dev` reachable. Reasoner summarizes failures + emits findings.
2	Deploy verifier	`one-shot`	Latest CF Pages deploy state (both projects) via CF API; build-log scan for warnings; commit SHA freshness vs `origin/main`; version-bump file consistency.
3	Database integrity	`one-shot`	Row counts ± delta on critical tables (`app_users`, `tasks`, `images`, `image_gen_jobs`, `event_payments`, `rental_payments`, `stripe_payments`); R2 ↔ `public.images` reconciliation; pg_cron last-run age (3 jobs); migrations applied vs `apps/control/migrations/` files.
4	Edge function health	`one-shot`	Each Supabase edge fn returns expected status; auth-required ones return 401 without keys; runs `smoke-image-gen.mjs` as a probe.
5	Security passive	`agent-loop`	Tool catalog: `run_gitleaks`, `query_osv`, `read_file`, `git_diff_since`, `record_finding`. Agent decides what to scan, follows up on suspicious diffs, can correlate (e.g. "this commit added an auth function — read it and check for bypass patterns").
6	Security active	`one-shot`	Deterministic probes: anon-key calls that should fail on RLS-protected tables; public R2 bucket inventory check; unauth probe on every edge fn endpoint. Reasoner classifies any unexpected results.
7	Test coverage	`agent-loop`	Tool catalog: `list_routes`, `list_edge_fns`, `list_critical_files`, `read_file`, `find_existing_tests`, `rank_criticality`, `draft_test_scaffold`, `record_finding`. Agent inventories surfaces, decides what's worth testing, drafts scaffolds, emits findings with `metadata.draft_task_payload`. UI shows an Approve & queue button per finding — on click, inserts a row into the existing `tasks` table for the Oracle Phoenix prompt-runner. Manual gate stays in v1.

Plus, outside alpu.ca, a tiny dead-man-switch Cloudflare Worker (apps/control/worker/sentinel-deadman/) that flags a finding row if no orchestrator heartbeat in 25h.

Test-framework prerequisite

The project has no test framework today. Adding Vitest (unit/integration) + Playwright (e2e) is a prerequisite, delivered in Phase 0 as a prompt-runner task — Claude opens the PR; admin reviews and merges; the agent then operates against a real framework.

5. Data model — Supabase schema

New migration: apps/control/migrations/20260506_sentinel_schema.sql. All tables admin-only via RLS (role check against app_users.role IN ('oracle','admin')).

monitoring_agents
  id, name, slug, description, surface, owner,
  execution_mode text,            -- 'one-shot' | 'agent-loop'
  model_provider, model_id, model_params jsonb,
  model_rationale text,           -- short "why this mode + model" string
  system_prompt_template_id uuid, -- agent-loop only
  tool_catalog text[],            -- agent-loop only; subset of registered tool names
  max_turns int,                  -- agent-loop only; hard cap on SDK loop iterations
  schedule_cron text, enabled bool,
  daily_cost_ceiling_usd numeric, -- auto-pause if exceeded in 24h window
  timeout_seconds int default 600,
  last_run_id, last_status, last_run_at,
  created_at, updated_at

monitoring_runs
  id, agent_id, trigger ('cron'|'manual'|'deadman'),
  triggered_by uuid (app_users.id, nullable for cron),
  triggered_meta jsonb,
  started_at, completed_at,
  status ('queued'|'running'|'success'|'partial'|'failed'|'cancelled'|'timeout'),
  summary text, cost_usd numeric, total_tokens int,
  models_used text[], step_count int

monitoring_run_steps
  id, run_id, agent_id, step_index int, step_name,
  step_kind ('probe'|'reasoner'|'aggregate'|'tool_call'),
  model_provider text, model_id text, model_rationale text,
  prompt_template_id, prompt_text,
  input_context jsonb, output jsonb,
  artifact_ids uuid[],            -- references monitoring_artifacts
  consumed_artifact_ids uuid[],   -- handoff lineage
  started_at, completed_at, duration_ms int,
  prompt_tokens int, completion_tokens int, cost_usd numeric,
  status ('success'|'fail'|'timeout'|'error'|'skipped'),
  error text

monitoring_prompt_templates
  id, name, version int, agent_slug, step_kind,
  description text, content text, variables text[],
  created_at, deprecated_at,
  unique (name, version)

monitoring_probes
  id, run_id, run_step_id, agent_id, probe_name, target_kind, target_ref,
  status ('pass'|'fail'|'warn'|'skip'|'error'),
  output jsonb, duration_ms int, error text,
  baseline_duration_ms int        -- rolling p50 for regression hints

monitoring_artifacts
  id, run_id, run_step_id (nullable), agent_id,
  kind ('http_response'|'screenshot'|'stack_trace'|'log_excerpt'
       |'json_blob'|'diff'|'test_scaffold'),
  storage ('inline'|'r2'),
  inline_data text (nullable), r2_key text (nullable),
  size_bytes int, content_type text,
  created_at

monitoring_findings
  id, dedup_key (unique per agent), agent_id,
  first_seen_run_id, last_seen_run_id, occurrence_count int,
  consecutive_run_count int,
  severity ('critical'|'high'|'medium'|'low'|'info'),
  title text, description text, recommended_action text,
  surface text, target_ref text,
  metadata jsonb,                 -- carries draft_task_payload etc.
  status ('open'|'acknowledged'|'dismissed'|'resolved'),
  ack_by, ack_at, ack_notes,
  resolved_at,
  created_at, updated_at

monitoring_manifest
  id, agent_id, target_kind ('route'|'table'|'edge_fn'|'package'
                            |'file'|'cron_job'|'service'),
  target_ref text, probe_config jsonb,
  source ('crawler'|'llm_merge'|'manual'|'bootstrap'),
  approval_status ('active'|'pending_approval'|'rejected'),
  added_at, added_by, last_seen_at, enabled bool

monitoring_heartbeats
  id, source ('orchestrator'|'deadman'), recorded_at, meta jsonb

monitoring_audit_log
  id, actor_id (app_users.id, nullable for system),
  action text,                    -- 'agent_model_changed'|'agent_paused'|...
  target_kind, target_id,
  before jsonb, after jsonb,
  notes text,
  created_at

Dedup keyed on (agent_id, dedup_key). Each probe computes a stable dedup key (e.g. uptime:route:/en/relations/fundraising:status_5xx). On second occurrence the existing finding's last_seen_run_id and occurrence_count increment instead of creating a new row.

Regression-aware severity. When a finding first opens, severity comes from the reasoner. When the same dedup key reopens after N successful runs (default 30), severity is bumped one level. When a probe's duration exceeds 3× its baseline_duration_ms, a performance_regression finding is auto-generated.

Retention. monitoring_run_steps, monitoring_probes, monitoring_artifacts rows older than 90d auto-delete via pg_cron job; R2 artifacts pruned in step. monitoring_runs, monitoring_findings, monitoring_audit_log retained indefinitely.

6. Reasoner & model strategy

The reasoner is the LLM execution layer. Two paths:

6.1 One-shot path (`reasoner/oneshot.ts`)

Direct Anthropic API call with a structured-output schema. The orchestrator collects probe results, hands them to the reasoner alongside diff context and manifest summary, and gets back a summary + findings array. No tool calling; one prompt, one response.

export interface OneShotInput {
  agent: { name: string; surface: string; description: string };
  runContext: { startedAt: Date; trigger: string; previousRunSummary?: string };
  probes: ProbeResult[];
  diffSinceLastRun?: { commits: string[]; files: string[]; patches: string };
  manifestSummary: { covered: string[]; new: string[] };
}

export interface ReasonerOutput {
  summary: string;
  findings: Array<{
    dedupKey: string;
    severity: 'critical'|'high'|'medium'|'low'|'info';
    title: string;
    description: string;
    recommendedAction: string;
    surface: string;
    targetRef?: string;
  }>;
  cost: { promptTokens: number; completionTokens: number; usd: number };
  modelUsed: string;
}

export interface OneShotReasoner { ask(input: OneShotInput): Promise<ReasonerOutput>; }

Used by: Uptime, Deploy verifier, DB integrity, Edge function health, Security active.

6.2 Agent-loop path (`reasoner/agent-loop.ts`)

Built on the Claude Agent SDK (@anthropic-ai/agent-sdk, same library as Claude Code). The orchestrator instantiates an SDK session per agent run with:

The agent's system prompt (loaded from monitoring_prompt_templates)
A tool catalog — TS functions exposed as SDK tools (see §6.3)
Model + max-turns from agent config
An event subscriber that writes every turn + tool call to monitoring_run_steps in real time

Used by: Security passive, Test coverage.

The SDK handles the loop (model → tool call → tool result → model → …) until the model emits no more tool calls or max_turns is reached. Final emitted findings come via the record_finding tool — the orchestrator collects these and persists them after dedup.

export interface AgentLoopRunner {
  run(input: {
    agent: AgentConfig;
    systemPrompt: string;
    tools: ToolCatalog;
    initialContext: { runId: string; diff?: GitDiff; manifest: ManifestSummary };
    maxTurns: number;
    onStep: (step: PersistableStep) => Promise<void>;  // streamed to monitoring_run_steps
  }): Promise<{ findings: Finding[]; cost: Cost; modelUsed: string; turns: number }>;
}

6.3 Tool catalog

Lives in apps/control/src/lib/sentinel/tools/. Each tool is a typed function with a Zod input schema, exposed to the SDK via the standard tool-definition shape. Tools delegate to the same probe library used by one-shot agents — so the capability is shared, only the calling style differs.

Tool	Purpose	Used by
`http_probe(url, expectedStatus?)`	Fetch a URL; record artifact on failure	Uptime (one-shot calls directly) + agent-loop
`query_supabase(sql, params)`	Execute a read-only SQL query under service role	DB integrity, Test coverage, Security passive
`read_file(path)`	Read a file from the working tree	Test coverage, Security passive
`git_diff_since(commit_sha)`	Diff between commit and HEAD	Security passive
`run_gitleaks(scope)`	Run gitleaks against working tree; return JSON report	Security passive
`query_osv(packageJsonPath)`	Hit OSV.dev for advisories on a package.json	Security passive
`list_routes()` / `list_edge_fns()` / `list_critical_files()`	Inventory helpers built on the manifest	Test coverage
`find_existing_tests(targetPath)`	Look up Vitest/Playwright tests covering a target	Test coverage
`draft_test_scaffold(target, framework)`	Produce a starter test file body	Test coverage
`cf_pages_status(project)`	Latest deploy state via CF API	Deploy verifier (one-shot calls directly) + agent-loop
`r2_list(bucket, prefix?)`	List R2 objects	Security active, DB integrity
`record_finding(payload)`	Emit a finding (severity, title, description, recommendedAction, surface, dedupKey, metadata?)	All agent-loop agents

Tools execute on the orchestrator process (not on Anthropic infra) — they have direct local network access to Supabase, R2, the working tree, and CF API.

6.4 Default model + mode assignments for v1

Agent	Mode	Model	Rationale stored on agent
Uptime	one-shot	Haiku 4.5	"Low-complexity summarization of HTTP probe results — Haiku is sufficient."
Deploy verifier	one-shot	Haiku 4.5	"Structured CF API output and build-log scan — Haiku handles deterministic formatting well."
DB integrity	one-shot	Haiku 4.5	"Row-count delta interpretation; arithmetic + threshold judgment fits Haiku."
Edge function health	one-shot	Haiku 4.5	"Status-code summarization; smoke-test pass/fail aggregation."
Security passive	agent-loop	Sonnet 4.6	"Diff-driven security investigation needs multi-turn reasoning + targeted tool calls; agent-loop on Sonnet is the right shape."
Security active	one-shot	Sonnet 4.6	"Probes are deterministic; reasoning is interpreting unexpected results. One-shot avoids agent-loop overhead."
Test coverage	agent-loop	Sonnet 4.6	"Inventorying surfaces and ranking criticality is a navigation problem; agent-loop with file-reading tools fits."
LLM-on-merge (coverage extension)	one-shot	Haiku 4.5	"Diff → manifest entry is structured-output; Haiku is fast + cheap."

Both mode and model are mutable per agent from the UI. Mode changes are visible in the audit log alongside model changes. The rationale field surfaces in the activity log and run drill-down so anyone reviewing a run knows why this configuration was used at the time.

Migration intent: once we have ~4 weeks of steady-state runs + cost data, evaluate which agents could downshift to a local Ollama model on alpu.ca without quality loss. The Agent SDK supports custom model providers, so the agent-loop path will accept Ollama too.

7. Orchestrator on alpu.ca

Lives at infra/alpu/sentinel-orchestrator/ (committed; deployed via rsync + systemd). Stack: Node 20 + TypeScript, runs as sentinel.service.

Schedule

Single sweep nightly inside the 10pm–8am CST window. Cron at 0 22 * * * America/Chicago kicks off the orchestrator; agents run sequentially in default order (cheapest/fastest first, most-expensive last):

22:00 — Crawler refreshes monitoring_manifest 22:15 — Deploy verifier 22:30 — Uptime 23:00 — Edge function health 23:30 — Database integrity 00:00 — Security passive (LLM-on-diff) 01:00 — Security active 02:00 — Test coverage

Whole sweep typically finishes by ~03:30 CT. Each agent records start/end so the activity log shows actual cadence; sequencing is config-driven.

Entry points

cron — system cron at 22:00 CT
webhook listener on localhost:8765 for manual "run now" requests forwarded by a Cloudflare Tunnel; HMAC-signed (shared secret in ~/.config/sentinel/.env)
heartbeat task every 15min — POSTs to a Supabase function that updates monitoring_heartbeats

Run flow per agent

Insert monitoring_runs row (status=running)
Gather manifest entries; execute probes; collect outputs + artifacts
Compute diff since last successful run for this agent (commits via git log, file changes)
Call reasoner via configured adapter; persist findings (with dedup logic)
Update monitoring_runs (status, cost, summary)

Auth & secrets. Bitwarden CLI (bw) unlocks at service start with a session token; service-role Supabase key + Anthropic key + R2 keys stored in ~/.config/sentinel/.env. Bootstrap recipe added to infra/runbook.md.

8. Coverage extension — crawler + LLM-on-merge

Crawler (runs at the start of each nightly sweep)

Walks apps/garden/** and apps/control/src/app/**/page.tsx to enumerate routes
Walks apps/control/supabase/functions/ for edge fns
Queries information_schema.tables filtered to public schema
Reads apps/*/package.json for dependency lists
For each newly-seen target, inserts a monitoring_manifest row with source='crawler', approval_status='active', and a default probe config

LLM-on-merge (GitHub Action)

Triggers on push: branches: [main]
Reads commit range + diff
Calls Anthropic API (Haiku) with the diff + current manifest summary
Posts proposed manifest entries to a Supabase edge function (sentinel-llm-merge) which writes them to monitoring_manifest with approval_status='pending_approval'
Admin sees them in Sentinel → Coverage → Pending and approves/rejects

Crawler = always-on default coverage; LLM = high-quality additions when code lands. Coverage grows automatically while the human stays in the loop on what gets monitored.

9. Frontend — Sentinel intranet section

This is where someone unfamiliar with the system has to be able to land and orient. The frontend carries the entire weight of the "dashboard only" stance.

9.1 Information architecture

Sentinel (top-level intranet section) ├── Overview ← landing ├── Coverage ← surface × agent matrix ├── Agents ← list of 7 agents (cards) │ └─ Agent drawer ← About / Probes / Prompts / Runs / Findings / Settings ├── Activity ← chronological event feed ├── Findings ← filterable list │ └─ Finding drawer ← description / recommended action / artifacts / actions ├── Runs ← list of runs │ └─ Run drawer ← stepper with prompts, models, context, outputs, handoffs └── Docs ← system README + per-agent docs + glossary

Detail views are right-side drawers (mirroring the existing task-detail-drawer.tsx pattern). This keeps the static-export build workable — tab pages are pre-rendered, drawer content is fetched client-side from Supabase. Drawer state is stored in URL hash so detail views are linkable: /en/sentinel/runs#run=abc123.

9.2 Tab-by-tab spec

Overview

Hero status card: pill (green/yellow/red), last sweep timestamp, "X agents passed / Y failed last night"
Open critical + high findings count + a quick-list (top 3 with severity, title, surface)
Today's activity excerpt (last 10 entries) with "View all" link
Quick-link tiles to: All findings · All runs · Coverage matrix · Pending coverage approvals
Polling: every 15s

Coverage

The "at-a-glance" surface map.

Filters above the matrix: surface kind (all / routes / tables / edge fns / packages / services), agent (multi-select), status (all / has-failure / no-coverage)
Search box for filtering by surface name
Matrix table:
- Rows: surfaces grouped by kind, with kind heading rows (sticky)
- Columns: each agent that can cover this surface kind
- Cells: small status pill (pass = green dot, fail = red, warn = amber, never run = gray, not applicable = empty)
- Cell click → popover with last run for that probe + button "Open run drawer"
"Surfaces without coverage" callout at the top if any surface has no agent covering it
Sub-section below the matrix: Pending coverage approvals — manifest rows with approval_status='pending_approval' from LLM-on-merge, with Approve / Reject buttons inline

Agents

Card grid (3 cols desktop, 1 mobile)
Each card: agent name + surface domain icon, status pill, last run (relative time), open findings badge, 30-run sparkline (plain SVG), model chip with rationale tooltip, schedule, "Run now" button (HMAC-signed POST → orchestrator)
Click card → opens Agent drawer with tabs:
- About — markdown rendered from apps/control/src/components/sentinel/docs/{slug}.md. Covers purpose, surfaces covered, probes run, severity rules, default model + rationale, known limitations.
- Probes — manifest entries this agent owns
- Prompts — list of prompt templates with version history; click → render full prompt text + variable list
- Runs — last 30 runs with link to run drawer
- Findings — currently open findings for this agent
- Settings — model + rationale (editable), cost ceiling, timeout, schedule. Edits go to monitoring_audit_log.

Activity

Chronological feed, newest at top
Each row: timestamp, event icon (run/finding/manifest/audit/heartbeat), actor (system / cron / user), summary, model chip + cost chip if applicable, "Open" button to relevant drawer
Filters: event type, actor, agent, date range
Each entry expandable to show details inline
Live append: new entries fade in at top via 15s polling
"Load more" pagination (50 at a time)

Findings

Filters: severity (multi), status (open/ack/dismissed/resolved), agent (multi), surface (multi), recurrence count (≥1, ≥5, ≥30)
Saved filter "presets": "Open critical/high", "Recurring (≥5)", "Acknowledged but not resolved", "Test coverage drafts"
Table columns: severity badge, title, surface, agent, occurrences, last seen, status
Sort: severity desc (default), last seen desc, recurrence desc, age asc
Click row → opens Finding drawer
Bulk actions toolbar: select multiple → Acknowledge / Dismiss with shared note

Finding drawer:

Header: severity badge (large), title, status pill, surface, agent
Description (full reasoner text)
Recommended action (bordered box, highlighted)
Occurrence history: list of run links with timestamps; sparkline of severity over time
Related artifacts (collapsible viewers)
For test-coverage findings: Approve & queue primary button + draft task preview (target file, suggested framework, prompt, scaffold). Clicking opens a confirmation modal; on confirm, inserts into tasks table and shows the task link.
Actions footer: Acknowledge (with note), Dismiss (with reason), Resolve, View related runs
Audit log of actions on this finding shown at bottom

Runs

Filters: agent, trigger (cron/manual/deadman), status, date range
Table columns: started_at, agent, trigger, status, duration, cost, models used, finding count
Click row → opens Run drawer — the multi-step transparency surface

Run drawer:

Header: agent name, execution-mode chip (one-shot or agent-loop), trigger + actor, status pill, total cost, total duration, models used (chips), turn count (agent-loop only)
Stepper (vertical timeline): each step rendered as a card showing index + name, kind chip (probe / reasoner / agent_turn / tool_call / aggregate), model chip with rationale tooltip, duration + cost, status pill
For agent-loop runs, the stepper visually groups consecutive agent_turn + child tool_call steps so it's obvious which turn invoked which tool. Tool-call cards show tool name + arguments (collapsed) and the result (collapsed).
Card expands inline to reveal: Prompt (template name + version + collapsible rendered text — for agent-loop, this is the system prompt for the first turn and the running message history for subsequent turns), Input context (collapsible jsonb tree), Output (collapsible jsonb tree), Artifacts produced + artifacts consumed with handoff arrows
Findings raised in this run (sidebar or bottom strip) with quick-jump to Finding drawer

Docs

System-level documentation hub. Designed for a new collaborator to onboard cold. Three subsections:

Overview — renders infra/sentinel/README.md
Per-agent docs — list of seven agent docs with click-to-render
Glossary — renders infra/sentinel/glossary.md (terms: agent, probe, finding, dedup key, manifest, etc.)

All rendered with react-markdown; no MDX runtime needed.

9.3 Component catalog

Lives in apps/control/src/components/sentinel/. New shared components:

Component	Purpose
`<SeverityBadge severity>`	Colored pill with icon: critical / high / medium / low / info
`<StatusPill status variant>`	Green/red/amber/gray dot with label. Variants: `run` `probe` `agent` `manifest`
`<RunSummaryCard run>`	Card for runs list / agent's recent runs
`<RunStepper run steps artifacts>`	Vertical step timeline with handoff arrows
`<PromptViewer template renderedText variables>`	Collapsible prompt with version-history link
`<ArtifactViewer artifact>`	Auto-routes by `kind`: image (R2 fetch), HTTP body (syntax highlight), JSON tree, screenshot, stack trace, diff, scaffold
`<JsonTree data collapsedDepth>`	Expandable jsonb viewer for input/output
`<CoverageMatrix surfaces agents results>`	CSS-grid surface×agent matrix with cell popovers
`<AgentCard agent>`	Card with sparkline, model chip, "Run now"
`<FindingsTable findings filterable bulkActionable>`	Findings list with bulk actions
`<ActivityEntry event>`	One feed row, expandable
`<MarkdownDoc source>`	`react-markdown` wrapper with project styling
`<SeverityHeatmap occurrences>`	Tiny inline heatmap for finding history
`<CostBadge cost>`	USD pill (`$0.0008` / `$1.20`)
`<ModelChip provider model rationale>`	Provider icon + model name; hover tooltip = rationale
`<Sparkline data status>`	30-point pass/fail SVG (reuses devcontrol/context-tab.tsx technique)
`<DrawerShell title onClose>`	Right-side drawer wrapper; hash-state aware

9.4 Data layer

Single composable hook useSentinel() in apps/control/src/hooks/use-sentinel.ts
Sub-hooks: useAgents(), useRun(runId), useRuns({ filter }), useFindings({ filter }), useFinding(id), useCoverage(), useActivity({ filter, page }), useManifest({ status })
All read-side hooks share a small in-memory cache (15s TTL); polling tabs (Overview/Activity) refetch on interval; detail drawers refetch on open
Mutations go through useSentinelMutations(): acknowledgeFinding, dismissFinding, resolveFinding, approveManifestEntry, rejectManifestEntry, updateAgentSettings, triggerAgentRun (HMAC-signed POST), approveAndQueueTask (creates a row in tasks from a finding's metadata.draft_task_payload)
Every mutation invalidates relevant cached queries

9.5 Loading / empty / error

Skeleton placeholders for all card grids and tables (one row of pulse-animated bars)
Empty: "No findings open" — emerald-tinted card with green checkmark; "No agents have run yet" — guidance to trigger first run; "No coverage for this surface" — call-to-action to open a manifest manual entry
Error: red-bordered card with error message + Retry button + "Report issue" link

9.6 Visual design

Inherits intranet design: slate-on-white, Tailwind 4, no shadcn
Severity colors: critical=rose-700, high=red-600, medium=amber-600, low=blue-500, info=slate-400
Status colors: pass=emerald-500, fail=rose-600, warn=amber-500, skip/never=slate-300, paused=slate-400
Surface accent colors (thin left border on agent cards / matrix column headers): uptime=blue, deploy=purple, db=teal, edge-fn=indigo, security-passive=amber, security-active=rose, test-coverage=emerald
Typography: Inter for body, JetBrains Mono for code/IDs
Spacing: cards 5-unit padding; section headings 6-unit margin; matrix cells dense (1.5-unit padding)

9.7 Accessibility

Keyboard navigable: all rows/cards focusable; arrow keys navigate matrix; Enter to open drawer; Esc to close drawer
Screen reader friendly: severity badges have aria-label; status pills include text in addition to color
Color is never the only signal: every status/severity pairs color with text or icon

9.8 Routing

Static-export-compatible:

/[lang]/sentinel/[tab]/page.tsx for the seven tab pages — uses <TabContent section="sentinel" /> like existing sections
All detail views are right-side drawers; drawer state is in URL hash (e.g. #run=abc-123, #agent=uptime, #finding=xyz) so they're linkable + bookmarkable
Drawers fetch their data client-side on open

9.9 Auth gating

Wrap section in <AdminGuard> (new file apps/control/src/components/auth/admin-guard.tsx) checking appUser.role IN ('oracle','admin')
Demo-role users see the section hidden from nav and a 403 if they navigate by URL
<AdminGuard> is created in this work but not retrofitted onto the existing admin section — separate decision

10. Notifications: daily morning digest

A single Cloudflare Worker (apps/control/worker/sentinel-digest/) on a 7am CT cron. Queries last 24h of runs/findings/audit log via Supabase service role. Renders an HTML email via Resend.

Recipients: app_users with role IN ('oracle','admin').

Content:

Headline: green/yellow/red status + count of failed agents
New high/critical findings (title, surface, recommended action — link to Finding drawer)
Recurring findings whose severity bumped overnight
Cost summary: total, by agent
Pending coverage approvals count
Failed runs (if any) with link
Footer: link to Sentinel dashboard

11. Files to create / modify

Create

apps/control/migrations/20260506_sentinel_schema.sql — schema (all monitoring_* tables)
apps/control/migrations/20260506_sentinel_rls.sql — RLS policies + admin-only access
apps/control/migrations/20260506_sentinel_seed_prompts.sql — seed monitoring_prompt_templates
apps/control/src/lib/sentinel/probes/{uptime,deploy,db-integrity,edge-fn,security-passive,security-active,test-coverage}.ts
apps/control/src/lib/sentinel/reasoner/oneshot.ts — direct Anthropic API path (used by deterministic agents)
apps/control/src/lib/sentinel/reasoner/agent-loop.ts — Claude Agent SDK runner (used by Test coverage + Security passive)
apps/control/src/lib/sentinel/reasoner/step-recorder.ts — common step persistence; subscribes to SDK events for agent-loop, called manually for one-shot
apps/control/src/lib/sentinel/tools/index.ts — tool registry
apps/control/src/lib/sentinel/tools/{http-probe,query-supabase,read-file,git-diff,run-gitleaks,query-osv,inventory,find-existing-tests,draft-test-scaffold,cf-pages,r2-list,record-finding}.ts — individual tool implementations
apps/control/src/lib/sentinel/manifest.ts — crawler logic
apps/control/src/lib/sentinel/artifacts.ts — inline-vs-R2 storage helper
apps/control/src/components/sentinel/{overview,coverage,agents,activity,findings,runs,docs}/ — one folder per tab
apps/control/src/components/sentinel/shared/ — shared components
apps/control/src/components/sentinel/docs/{uptime,deploy,db-integrity,edge-fn,security-passive,security-active,test-coverage}.md — per-agent in-UI docs
apps/control/src/components/sentinel/drawers/{agent,run,finding,probe}-drawer.tsx
apps/control/src/components/auth/admin-guard.tsx
apps/control/src/hooks/use-sentinel.ts
apps/control/src/hooks/use-sentinel-mutations.ts
apps/control/src/app/[lang]/(intranet)/sentinel/[tab]/page.tsx
apps/control/worker/sentinel-deadman/ — CF Worker dead-man switch
apps/control/worker/sentinel-trigger/ — receives "Run now" UI clicks, HMAC-signs and POSTs to alpu.ca via Tunnel
apps/control/worker/sentinel-digest/ — daily morning digest at 7am CT via Resend
apps/control/supabase/functions/sentinel-llm-merge/ — edge function called by GitHub Action
infra/alpu/sentinel-orchestrator/ — Node service + systemd unit + README
infra/sentinel/README.md — system-level documentation
infra/sentinel/glossary.md
.github/workflows/sentinel-coverage.yml — LLM-on-merge coverage extension
vitest.config.ts + Playwright config + sample tests per app — bootstrap delivered as a prompt-runner task in Phase 0

Modify

apps/control/src/types/intranet.ts — add sentinel section + tabs
apps/control/package.json — add react-markdown (UI), @anthropic-ai/agent-sdk (server-side, only used by orchestrator imports), zod for tool schemas; Vitest + Playwright added by the Phase 0 bootstrap task
apps/garden/package.json — add Vitest + Playwright via bootstrap task
infra/runbook.md — Sentinel orchestrator bootstrap section + alpu.ca recipe + Bitwarden recipe
CLAUDE.md (root) — short "Monitoring" section pointing to this system
apps/control/scripts/lint-image-gen.sh — extend whitelist for sentinel probes if they touch the image catalog

12. Build sequence — phased & sequenced

Five phases. Each item is a discrete merge-able PR. Items within a phase are mostly parallelizable; items across phases have dependencies (called out where they cross).

Phase 0 — Prerequisites (~2 days)

Goal: unblock everything else. All items must complete before Phase 2's test-coverage agent can produce useful findings.

#	Item	Notes
0.1	Create alpu.ca user account + ssh key + Bitwarden secret recipe	Documented in `infra/runbook.md`
0.2	Reserve Cloudflare Tunnel hostname for alpu.ca → control webhook	Used by `sentinel-trigger` Worker in Phase 1
0.3	Generate HMAC shared secret; add to `~/.config/sentinel/.env` on alpu.ca + as a Cloudflare Worker secret
0.4	Seed task in `tasks` table: "Bootstrap Vitest + Playwright + sample tests in apps/garden + apps/control + CI hooks"	Manually inserted via SQL. Headless prompt-runner picks it up; opens PR. Admin reviews + merges. Dogfood moment.
0.5	Confirm Bitwarden CLI install + unlock-on-boot recipe on alpu.ca

Exit criteria: alpu.ca reachable; tunnel up; HMAC secret in place; Vitest + Playwright PR merged with at least one passing test per app; CI runs them.

Phase 1 — Foundation (~1 week, depends on Phase 0)

Goal: a single agent runs end-to-end with full step transparency.

#	Item	Depends on
1.1	Migration: `monitoring_*` schema (all tables in §5, including `execution_mode` + tool/turn fields on `monitoring_agents`)	0.1
1.2	Migration: RLS policies (admin-only)	1.1
1.3	Migration: seed `monitoring_prompt_templates` (placeholders; refined in Phase 2)	1.1
1.4	Add `<AdminGuard>` component	—
1.5	One-shot reasoner (`reasoner/oneshot.ts`) with Anthropic adapter + mock adapter	—
1.6	Artifact storage helper (inline vs R2 routing)	1.1
1.7	Step recorder (writes `monitoring_run_steps` + artifacts during execution; same module used by both modes)	1.5, 1.6
1.8	Probes: Uptime + Deploy verifier (deterministic, both produce artifacts on failure)	1.6
1.9	Orchestrator skeleton on alpu.ca: cron + heartbeat + HMAC webhook + per-step persistence + per-agent timeout + cost ceiling. Dispatches by `execution_mode` (only `one-shot` wired in this phase).	0.2, 0.3, 1.7, 1.8
1.10	`sentinel-trigger` Worker (HMAC-signs UI clicks → alpu.ca tunnel)	0.2, 0.3
1.11	Add `sentinel` section to `intranet.ts` + route shell + admin gating	1.4
1.12	Sentinel UI: Overview tab (skeleton) + Runs tab + Run drawer with stepper (renders `probe` + `reasoner` step kinds; `agent_turn` + `tool_call` rendering arrives in Phase 3)	1.9, 1.11
1.13	Sentinel UI: Agents tab (cards) + manual "Run now" wiring	1.10, 1.12
1.14	Shared components: SeverityBadge, StatusPill, ModelChip, CostBadge, JsonTree, PromptViewer, ArtifactViewer (image + http_response + json), DrawerShell, Sparkline	1.11

Exit criteria: A nightly cron run on alpu.ca produces monitoring_runs + monitoring_run_steps rows for Uptime and Deploy verifier. UI Run drawer shows every step with model + rationale + prompt + input + output + artifacts. Manual "Run now" works from UI.

Phase 2 — Full agent set + Coverage view (~1.5 weeks, depends on Phase 1)

Goal: all 6 non-test agents firing nightly; Coverage matrix lit up; severity bumps and audit log working.

#	Item	Depends on
2.1	Probes (deterministic, also exposed as tools): DB integrity (row counts, R2 reconciliation, pg_cron freshness, migrations applied)	1.7
2.2	Probes: Edge function health (incl. existing `smoke-image-gen.mjs` as a probe)	1.7
2.3	Probes: Security active (RLS bypass attempts, public R2 inventory, unauth edge-fn probes)	1.7
2.4	Tool registry (`tools/index.ts`) + tool implementations: `http_probe`, `query_supabase`, `read_file`, `git_diff_since`, `cf_pages_status`, `r2_list`, `record_finding`. Each tool has a Zod input schema and shares code with the probe library.	1.7
2.5	Refine `monitoring_prompt_templates` per agent (versioned; v1 production prompts including system prompts for the future agent-loop agents)	1.3
2.6	Findings dedup logic (orchestrator-side: increment occurrence vs new row; also called from the `record_finding` tool)	1.7
2.7	Regression-aware severity bumps (bump after 30 consecutive passes; performance regression on 3× baseline)	2.6
2.8	Audit log emission for ack/dismiss/resolve, model change, execution-mode change, agent pause, manifest approval	1.1
2.9	Per-agent cost ceiling + timeout enforcement + auto-pause-with-finding (applies to both modes)	1.9
2.10	Sentinel UI: Findings tab + Finding drawer + bulk-action toolbar	1.12
2.11	Sentinel UI: Activity tab (chronological feed, filters, polling)	1.12
2.12	Sentinel UI: Coverage tab (matrix view + filters + Pending approvals section)	2.10
2.13	Per-agent in-UI docs: write six `sentinel/docs/*.md` files (Test coverage lands with that agent in Phase 3); render in Agent drawer About tab	2.10
2.14	Shared components: CoverageMatrix, FindingsTable, ActivityEntry, AgentCard, SeverityHeatmap, MarkdownDoc	1.14

Exit criteria: Five of seven agents fire nightly in one-shot mode (Uptime, Deploy verifier, DB integrity, Edge function health, Security active). Findings dedupe correctly. Severity bumps after 30 successful runs. Coverage matrix is populated. Audit log captures admin actions. Cost ceiling auto-pauses an agent on test. Tool registry exists and is consumable.

Phase 3 — Agent-loop path + Coverage extension + Test agent + Notifications (~1.5 weeks, depends on Phase 2)

Goal: Claude Agent SDK wired into the orchestrator; the two agent-loop agents (Security passive, Test coverage) running; coverage auto-grows; daily digest goes out.

#	Item	Depends on
3.1	Add `@anthropic-ai/agent-sdk` dependency; build `reasoner/agent-loop.ts` runner that initializes an SDK session per run with system prompt, tool catalog, model, max-turns; subscribes to step events and forwards to the step recorder	1.7, 2.4, 2.5
3.2	Orchestrator dispatcher: route runs by `execution_mode`; both modes fully wired	1.9, 3.1
3.3	Run drawer rendering: `agent_turn` + `tool_call` step kinds; visual grouping of turns and their tool calls; tool name/args/result display	1.12, 3.1
3.4	Agent-loop tool implementations specific to advanced agents: `run_gitleaks`, `query_osv`, `list_routes`, `list_edge_fns`, `list_critical_files`, `find_existing_tests`, `draft_test_scaffold`	2.4
3.5	Wire Security passive agent (`agent-loop`, Sonnet 4.6); write its system prompt; smoke-test on a known-safe diff	3.1, 3.4
3.6	Wire Test coverage agent (`agent-loop`, Sonnet 4.6); write its system prompt; smoke-test against the Vitest+Playwright setup from Phase 0; write Test-coverage in-UI doc	3.1, 3.4, 0.4
3.7	Finding drawer: Approve & queue button + draft task preview + confirmation modal	2.10
3.8	`approveAndQueueTask` mutation (inserts row into existing `tasks` table from finding metadata)	3.7
3.9	Crawler logic + manifest auto-population at start of each sweep	1.9, 2.12
3.10	LLM-on-merge GitHub Action (`.github/workflows/sentinel-coverage.yml`)	3.9
3.11	Supabase edge function `sentinel-llm-merge` (signature verify; writes pending rows)	3.10
3.12	Coverage tab: Pending approvals UI (Approve / Reject buttons inline)	2.12, 3.11
3.13	Dead-man-switch CF Worker (`sentinel-deadman`) — flags finding if no heartbeat in 25h	1.9
3.14	Daily digest CF Worker (`sentinel-digest`) — 7am CT cron → Resend	2.10

Exit criteria: A new route added in a feature branch shows up in monitoring_manifest after merge. Test coverage agent flags a sample uncovered file with a recommended action and a draft task; clicking Approve & queue creates a tasks row that the prompt-runner picks up. Security passive agent runs against a sample diff and emits findings via tool calls. Daily digest email arrives at 7am CT. Dead-man flagged in <25h when service stopped.

Phase 4 — Documentation & polish (~3 days, depends on Phase 3)

Goal: an unfamiliar collaborator can land in the UI and onboard cold.

#	Item	Depends on
4.1	Write `infra/sentinel/README.md` (system-level "how Sentinel works")	All prior
4.2	Write `infra/sentinel/glossary.md`	4.1
4.3	Sentinel UI: Docs tab — render README + per-agent docs + glossary	2.10, 2.13, 4.1, 4.2
4.4	Update `infra/runbook.md` with Sentinel orchestrator ops recipe	1.9
4.5	Update root `CLAUDE.md` with Monitoring section	4.1
4.6	Update `apps/control/scripts/lint-image-gen.sh` whitelist for sentinel probes	2.2
4.7	Visual polish pass on all tabs (skeleton, empty, error states)	2.12
4.8	Accessibility audit (keyboard nav, screen reader, focus)	4.7
4.9	Cost-and-performance review of the first week of runs; tune defaults	post-launch

Exit criteria: Docs tab renders cleanly. Visual polish complete. Accessibility audit passes. Runbook updated.

Total estimate

Phases 0–4 ≈ 4–4.5 weeks of focused work (slightly longer than the pre-SDK plan because Phase 3 now also covers the agent-loop runtime + tool catalog). Mostly sequential because the orchestrator + UI + agents stack on each other. Phase 0 should start immediately so the prompt-runner has time to deliver Vitest/Playwright bootstrap before Phase 3.

13. Verification — end-to-end checklist

Manual UI trigger. Click "Run now" on the Uptime agent in Sentinel → Agents. Run appears in Activity within 30s. Run drawer shows every step with model, rationale, prompt, input context, output, artifacts.
Nightly cron. journalctl -u sentinel.service -f on alpu.ca. Confirm 22:00 CT kickoff, sequential agent execution, full sweep complete by ~03:30 CT, exit 0.
Coverage matrix. Open Sentinel → Coverage. Confirm every garden + control route, every critical Supabase table, every edge function, and every external service is listed. Each row shows which agents cover it.
Dedup. Introduce a deliberate failure (e.g. delete a route temporarily). Run twice. Confirm finding has occurrence_count=2, not two rows.
Regression severity. Simulate 30 consecutive passes on a probe, then a fail; confirm severity bumps one level vs a fresh-fail finding.
Coverage extension. Create a branch with a new route + new edge function, merge to main. Within 2 min, GitHub Action posts manifest entries. Approve in UI. Next nightly run probes them.
Dead-man. Stop sentinel.service on alpu.ca. Within 25h, the dead-man Worker writes a finding with agent='deadman'. Restart service; next heartbeat resolves the finding.
Model swap with audit. Change Uptime agent's model_id from Haiku to Sonnet in the UI; provide a rationale. Confirm monitoring_audit_log has the change with before/after. Next run uses Sonnet and cost telemetry reflects it.
Cost ceiling. Lower a test agent's daily_cost_ceiling_usd to $0.001 and trigger a run. Confirm the agent auto-pauses and a finding is generated.
Daily digest. Confirm 7am CT email arrives at oracle/admin addresses with summary of last 24h.
Docs tab. Open Sentinel → Docs. Confirm README renders; each agent's doc renders; glossary visible.
Multi-step transparency (agent-loop). Open a Test coverage or Security passive run. Confirm the stepper shows alternating agent_turn + tool_call steps, with each tool call's name, arguments, and result visible. The execution-mode chip on the run header reads agent-loop.
Multi-step transparency (one-shot). Open an Uptime run. Confirm the stepper shows a flat list of probe steps followed by a single reasoner step. Execution-mode chip reads one-shot.
Approve & queue. Approve a test-write finding. Confirm a row appears in tasks with the draft payload; confirm the headless prompt-runner picks it up.
RLS. Sign in as a demo-role user ([email protected]). Confirm Sentinel tab returns 403 / hidden from nav.
Webhook auth. POST to the alpu.ca trigger endpoint without a valid HMAC signature; confirm rejection.

14. Out of scope for v1

Auto-remediation (agents fixing things). Recommendations only.
Public status page.
Real-user metrics, Core Web Vitals, performance monitoring.
Distributed tracing / OpenTelemetry.
PagerDuty / on-call escalation.
Slack notifications. (Daily morning email digest IS in v1; Slack is v1.5.)
Multi-agent coordination / debate.
Replacing CI's existing checks. CI stays; Sentinel sits above it.
Auto-creation of GitHub issues from findings. (v1.5 candidate.)
Pre-commit / pre-push hooks.
Local-model (Ollama) reasoner adapter. Deferred to v1.1 evaluation pass once steady-state cost data is available.
Trend analysis / week-over-week reports. (Data is captured; analytics view is v1.5.)
Self-meta-monitoring (Sentinel observing Sentinel). Daily digest covers the basics.
Ingesting external service status feeds (CF, Supabase status pages). v1.5.
Retrofitting <AdminGuard> onto the existing admin section.
Browser-based smoke tests via Playwright in Uptime. Revisit in v1.5 once Playwright is bootstrapped.
Slow-query monitoring via pg_stat_statements.

15. Open questions to revisit during build

Heartbeat frequency. 15min is a first guess. If monitoring runs ever need >15min uninterrupted, raise to 30min and bump dead-man window to 90min.
Manifest approval UX. The pending_approval flow can become tedious. If approval rate is ~100% in the first month, switch the LLM-on-merge default to active and reserve pending_approval for security-touching surfaces.
Test framework on a static-export Next.js setup. Confirm Vitest + Playwright play nicely with the next-intl-wrapped App Router. May need a small dance with the [lang] segment in test fixtures. Surface in Phase 0 task prompt.
Sequential agent timing. The 8-step schedule assumes typical run durations; if Security passive or Test coverage routinely overruns, reorder or parallelize. Schedule is config-driven, so tunable.
R2 cost. Artifact storage in R2 is cheap but not free. Set a per-agent artifact size cap; large bodies get truncated with a "view full body via R2" link.
Local model picks for v1.1. Validate Qwen2.5 vs Llama 3.3 on alpu.ca with a representative probe-output sample before committing in code.

Doc owner: Haydn. Drafted 2026-05-06. Markdown source: docs/devtasks/sentinel.md. Operational secrets in BW folder devops-sponic.