Agentic Gathering Host
A speaking AI companion that joins Sponic Agentic Gatherings as the host's wing-friend — listens, replies in the speaker's language, and runs the conversational glue. Production now uses LiveKit Agents + Gemini Live native audio, orchestrated by the self-hosted Phoenix worker.
livekit.plugins.google.realtime.RealtimeModel with gemini-3.1-flash-live-preview for native audio in/out. The older Deepgram → Claude → ElevenLabs/Gemini-TTS cascade is historical; apps/ai-host/gemini_tts.py remains in the tree but is not the active runtime.
What it is & what it does
The Agentic Gathering Host is a real-time voice agent designed to sit at the head of a Sponic Gardens gathering and play the role of warm, curious, slightly playful host. It joins a LiveKit room, receives participant audio, sends it through Gemini Live native audio, and publishes spoken responses back into the same room. Optional Simli renders the host as a live avatar on the venue display.
It is not a generic assistant. The persona is constrained to the Agentic Gathering register: 1–2 sentence replies, no medical/legal/financial advice, no politics or religion unless the table clearly invites it, no pretending to eat or drink.
The active production worker is apps/ai-host/agent.py on Oracle Phoenix, registered as the named LiveKit worker sponic-dinner-host. The historical dev task still documents how the stack evolved, but current cost/model planning should use Gemini Live.
How it works (Phase 1 architecture)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LiveKit Cloud (SFU) β
β wss://sponic-XXXX.livekit.cloud Β· WebRTC transport β
ββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββ
β guest mic audio β agent voice
βΌ β
ββββββββββββββββββββββββ β
β Browser / phone β (LiveKit Playground for β
β in the LiveKit room β testing; later a venue tab β
β β or a Sponic intranet page) β
ββββββββββββββββββββββββ β
β
ββββββββββββββΌβββββββββββββββ
β Python agent worker β
β apps/ai-host/agent.py β
β β
β Gemini Live native audio β speech β speech
β gemini-3.1-flash-live- β RealtimeModel
β preview Β· voice "Puck" β
β β β
β ββ transcript capture β for Supabase
β β β
β ββ optional Simli β avatar track
ββββββββββββββββββββββββββββββ
The agent worker is a long-lived Python process that registers with LiveKit Cloud as a named worker and gets dispatched into a room when one is created. It attaches an AgentSession backed by Gemini Live native audio. The session handles speech input, reasoning, and spoken output in one realtime model call; the worker captures transcript metadata for Supabase and can attach a Simli avatar session that subscribes to the generated audio.
Component choices & why
| Stage | Choice | Why |
|---|---|---|
| Transport | LiveKit Cloud (SFU) | Handles WebRTC plumbing, multi-participant rooms, and a hosted SFU we don't have to operate. Free tier covers Phase 1 (well under the 50 connection-min/mo dev limit). |
| Realtime model | Gemini Live via livekit.plugins.google.realtime |
Native audio in/out, multilingual speech, interruption handling, and one billing line for the core host session. |
| Persona prompts | Supabase prompt library | Runtime prompts are loaded from public.prompts by stable prompt names/codes, so prompt tuning does not require redeploying the worker. |
| Avatar | Simli optional | Runs only when the event has a face/avatar configuration. Cost is per active avatar minute, independent of how many humans are in the room. |
| Compute | Oracle Phoenix VM | Self-hosted worker, already paid. This avoids LiveKit-hosted agent-session metering while preserving LiveKit Cloud rooms. |
File map
| Path | Purpose |
|---|---|
apps/ai-host/agent.py | The worker entrypoint. Wires Gemini Live native audio into an AgentSession, loads prompt-library personas, records transcripts, starts optional Simli, and registers with LiveKit via cli.run_app(WorkerOptions(...)). |
apps/ai-host/persona.py | Prompt/dossier helpers used by the database-managed persona prompts. Internal persona names still include dinner_host and tea_party for compatibility. |
apps/ai-host/pyproject.toml | Declares Python 3.12 + livekit-agents 1.x + the Google realtime plugin. Use uv sync to install. |
apps/ai-host/gemini_tts.py | Historical custom TTS wrapper. Kept for reference, but production now uses Gemini Live native audio through agent.py. |
apps/ai-host/.env.example | Template for runtime credentials. Copy to .env and fill in. Never commit .env. |
apps/ai-host/README.md | Quick-start guide (this doc is the longer-form reference; the README is the 5-minute version). |
First-time setup
Do this once per machine. After it's done, you never repeat any of these steps — the day-to-day workflow in the next section is just a single command.
Step 1 — Install runtime
You need Python 3.12 and uv (the package manager). On macOS:
# Python 3.12
pyenv install 3.12 # if you use pyenv (recommended)
# or: brew install [email protected]
# uv (manages venv + lockfile)
brew install uv # macOS
# Linux/Ubuntu: curl -LsSf https://astral.sh/uv/install.sh | sh
# Verify
python3 --version # β Python 3.12.x
uv --version # β uv 0.x.y
Step 2 — Provision four service accounts
Sign up for four services and grab keys from each. Save every credential to the Bitwarden collection DevOps-sponicgarden (in the ALPU.CA org — it's a collection, not a personal folder; bw create item requires organizationId + collectionIds). The project never reads keys from .env alone — BW is the source of truth.
BW items already provisioned as of 2026-05-06:
- LiveKit Cloud β BW item
64d75f73-474a-4312-89e4-b442015302d8 - Google AI (Gemini) β BW item
c4b16931-335b-457a-9910-b416006d3b8c("Google Gemini β SponicGardens") (key is inlogin.password, not in custom fields)
| Service | Sign-up | What to grab | BW item name |
|---|---|---|---|
| LiveKit Cloud (WebRTC transport) |
cloud.livekit.io | Create project named sponic. From Settings → Keys, copy three values: project URL (wss://sponic-XXXX.livekit.cloud), API key (starts with API), API secret. |
LiveKit Cloud β sponic |
| Google AI (Gemini Live) (native audio model) |
aistudio.google.com/app/apikey | Reuse the existing Sponic Gemini key (already in BW). No separate provisioning needed — the same key powers live translation and Gemini Live. Default model: gemini-3.1-flash-live-preview; default voice: Puck. |
Google Gemini β SponicGardens (USE THIS) |
| Simli (optional avatar) |
simli.com | Only required for visual avatar sessions. The worker tracks avatar minutes separately with SIMLI_USD_PER_MINUTE. |
existing Simli key on Phoenix |
Step 3 — Install dependencies
cd apps/ai-host
uv sync # creates .venv/, installs livekit-agents 1.x + all plugins
# takes ~30 s on a warm cache, ~3 min from scratch
Step 4 — Configure .env
cp .env.example .env
# Open .env in your editor and paste these from the BW items above:
# LIVEKIT_URL=wss://sponic-XXXX.livekit.cloud
# LIVEKIT_API_KEY=API...
# LIVEKIT_API_SECRET=...
# GOOGLE_AI_API_KEY=AIza...
# GEMINI_MODEL=gemini-3.1-flash-live-preview
# GEMINI_VOICE=Puck
.env. The .gitignore excludes it, but if you rename it or add a sibling file (.env.local, etc.) the rule may not catch it. Keep credentials in BW; .env is just a runtime cache.
Step 5 — First-run smoke test
Confirm the install worked without going near a microphone:
cd apps/ai-host
uv run python -c "from persona import DINNER_HOST_PROMPT, build_prompt; print(f'Persona loaded: {len(build_prompt())} chars')"
# β Persona loaded: 1234 chars
uv run python -c "import ast; ast.parse(open('agent.py').read()); print('agent.py syntax OK')"
# β agent.py syntax OK
If both lines succeed, you're set up. Move to Normal operation.
Normal operation (day-to-day)
Once setup is done, every working session collapses to this:
Start the agent
cd apps/ai-host
uv run python agent.py dev
You should see:
[INFO] connecting to wss://sponic-XXXX.livekit.cloud
[INFO] registered worker: dinner-host-1
[INFO] waiting for room dispatch...
The dev subcommand runs in foreground with hot-reload on file changes — edit persona.py and the worker restarts itself. Use this for everything except production.
The worker is now idle, waiting for someone to join a LiveKit room. It costs nothing while idle (no API calls, no tokens).
Test in the LiveKit Playground
The official Playground is the day-to-day way to talk to your agent — no client code, just a browser tab.
Open agents-playground.livekit.io in a fresh tab. Paste your project URL, API key, and API secret (same values as in .env). The Playground caches them in localStorage, so this is one-time per browser.
Click Connect. Your worker terminal logs agent joined room within a second; the browser asks for mic permission.
You should hear the smoke-test greeting: "Welcome to Sponic Gardens! I'm your Agentic Gathering host tonight." If you hear it, the whole pipeline is healthy.
Talk to it. Try English ("What's for the Agentic Gathering?"), then Polish ("SkΔ d jesteΕ?"). Replies should arrive in ~1–1.5 s and match the language you spoke.
Hit Disconnect when done. The worker logs room closed and goes idle — ready for the next session. Leave it running while you iterate on persona.py.
Stopping
Just Ctrl-C the uv run terminal. There's no state to clean up — the worker is stateless between rooms.
Iterating on the persona
Edit apps/ai-host/persona.py while the worker runs in dev mode. The worker auto-reloads. Reconnect from the Playground (or refresh the page) to test the new prompt — existing rooms keep the old persona until you reconnect.
Updating dependencies
cd apps/ai-host
uv sync --upgrade # pull latest livekit-agents, plugins, etc.
uv run python -c "from livekit.agents import Agent; print('still works')"
Testing checklist
When validating a build (yours or after a pull), check these in order. They're listed cheapest-to-most-expensive in time, so a failure at any step short-circuits the rest.
Static checks (no creds, no mic)
uv run python -c "import ast; ast.parse(open('agent.py').read())"— syntaxuv run python -c "import ast; ast.parse(open('persona.py').read())"— syntaxuv run python -c "from persona import DINNER_HOST_PROMPT, build_prompt"— imports + exports
Live checks (Playground)
- Connection: worker logs
registered workerwithin ~3 s of starting - Dispatch: worker logs
agent joined roomwithin ~1 s of clicking Connect in Playground - Greeting plays: hearing the smoke-test sentence proves LiveKit + Gemini Live are wired
- Latency is acceptable on a short reply. Higher? Check Google AI Studio status and the Phoenix worker logs.
- Multilingual works: Polish in → Polish out. Spanish in → Spanish out. If everything comes back English, the persona prompt got mangled.
- Tone is right: 1–2 sentence replies, warm/curious. If it's lecturing, tighten
persona.py. - No crashes through 20+ consecutive turns. Memory should stay flat.
Production operation (deploy targets, monitoring)
Everything above assumes you're running locally during dev. To leave the agent running for an Agentic Gathering (or for hands-off staging), you need a host that stays up.
Phase 1 only needs a way to keep one Python worker process running — no public ingress, no DNS, no scaling story. Three reasonable hosts depending on stage:
| Where | Use | Pros / cons |
|---|---|---|
| M4 (laptop) | Day-to-day dev & one-off demos | Free, instant feedback. Dies when the lid closes — not for "always on". |
| Oracle Phoenix | Always-on staging | Already paid (Always Free tier). 4 ARM cores, 24 GB RAM, plenty of headroom alongside the prompt-runner. Outbound-only (worker connects out to LiveKit) so no firewall changes. Recipe in infra/runbook.md → "SponicControl prompt-runner on Oracle Phoenix" — same systemd-service pattern. |
| ALPUCA (M4 Mac Mini, Poland) | Low-latency for actual Agentic Gatherings | ~150 ms closer to Polish guests than Phoenix. Already runs the live-translation backend. Trade-off: home internet up = critical, single point of failure. |
Production-ish run (Phoenix)
Once the agent is stable enough to leave running:
# On Phoenix:
sudo apt install -y python3.12 python3.12-venv # 22.04 needs the deadsnakes PPA
curl -LsSf https://astral.sh/uv/install.sh | sh
git clone <repo> ~/sponic # or scp the apps/ai-host subdir
cd ~/sponic/apps/ai-host
uv sync
cp .env.example .env && nano .env # paste creds
# systemd unit (mirror the pattern from prompt-runner.service):
sudo nano /etc/systemd/system/sponic-ai-host.service
# [Service]
# WorkingDirectory=/home/ubuntu/sponic/apps/ai-host
# ExecStart=/home/ubuntu/.local/bin/uv run python agent.py start
# Restart=always
# EnvironmentFile=/home/ubuntu/sponic/apps/ai-host/.env
sudo systemctl daemon-reload
sudo systemctl enable --now sponic-ai-host
sudo journalctl -fu sponic-ai-host # tail logs
agent.py start in production, not dev. The dev mode auto-restarts on file changes — useful locally, dangerous on a host.
Monitoring
- LiveKit Cloud dashboard — rooms, participants, connection minutes used. cloud.livekit.io
- Google AI Studio dashboard — Gemini Live usage and quota. Same key as live translation, so usage is shared. aistudio.google.com
- Simli dashboard — avatar session minutes, if avatar display is enabled.
- journalctl if running under systemd. Worker logs to stdout.
Cost
Current production costs are session-based: Gemini Live for the host, LiveKit participant minutes for connected clients, and optional Simli avatar minutes.
| 10-minute scenario | Humans | Simli display | Estimated cost |
|---|---|---|---|
| Interview | 1 | No | $0.24 |
| Agentic Gathering | 4 | No | $0.25 |
| Agentic Gathering | 10 | No | $0.28 |
| Agentic Gathering | 4 | Yes | $3.26 |
| Agentic Gathering | 10 | Yes | $3.29 |
Assumptions: Gemini Live $0.005/min audio input + $0.018/min audio output, LiveKit WebRTC $0.0005/participant-min, Simli $0.30/min, and one extra display participant when Simli is enabled. The Phoenix worker is self-hosted, so LiveKit-hosted agent-session minutes are not included.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Worker connects but never gets dispatched | Project URL mismatch — worker is on a different LiveKit project than the Playground. | Confirm LIVEKIT_URL in .env matches the URL pasted in the Playground. |
| "401 Unauthorized" from LiveKit | API key/secret swapped or stale. | Re-copy from Settings → Keys; secrets are only shown once at creation, you may need a fresh key. |
| No greeting on connect, but worker shows "joined room" | Gemini Live model/voice is unknown, or GOOGLE_AI_API_KEY is missing/invalid. | Reset GEMINI_MODEL to gemini-3.1-flash-live-preview and GEMINI_VOICE to Puck. Verify the key with: curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_AI_API_KEY" | head — should return a JSON list of models, not a 401. |
| Latency > 3 s | Gemini Live response delay, overloaded worker, or network path issue. | Check Google AI Studio status/quota, Phoenix CPU/memory, and /var/log/sponic-ai-host.log. If the delay grows with context, reduce prompt size or restart the session. |
| Replies are in English no matter what | Persona prompt was edited and the multilingual rule got cut. | Re-check persona.py for the "Reply in the language of the most recent speaker" line. |
| Replies are 5 paragraphs long | Gemini Live is ignoring a soft persona constraint. | The 1–2 sentence cap is in the prompt library/persona prompt — tighten the wording (try "Never exceed 30 words."). |
uv sync fails on macOS arm64 | Apple Silicon needs Python 3.12+ for some plugins. | Confirm python3 --version is 3.12.x. pyenv install 3.12 if not. |
| Worker dies silently after a minute | Missing Gemini key, bad LiveKit credentials, or unhandled plugin error. | Check /var/log/sponic-ai-host.log, then restart sponic-ai-host.service after fixing env. |
Phase 1 vs the long roadmap
This document tracks the active Gemini Live implementation. The older 8-phase plan remains useful as history, but its cascade model/cost estimates are superseded by this page and the Phoenix runbook.
| Phase | Adds | Status |
|---|---|---|
| 1 — Native-audio agent | Gemini Live, single participant, smoke-test greeting | Shipped 2026-05-13 |
| 2 — Multi-mic subscription | Subscribe to all participants in the room, label utterances by speaker | Pending |
| 3 — Intervention control | Prompt/tool tuning for when the host should speak versus stay quiet | In progress |
| 4 — Programmatic prompts | Data-channel API so the host can deliver scheduled toasts and table announcements on cue | Pending |
| 5 — Simli avatar | Live avatar on a venue display, driven by the Gemini Live audio output. | Shipped |
| 6 — Translation-app bridge | Subscribe to ALPUCA's existing wss://subs.sponicgardens.com/subtitles?lang=... for transcripts — eliminates duplicate STT cost | Pending |
| 7 — Persistence & memory | Per-guest memory across Agentic Gatherings (likes, dietary, prior conversations) | Pending |
| 8 — Phone interviewer | Phone-callable host for prospect screening | Pending — extend Vapi, don't rebuild. |
apps/control/supabase/functions/vapi-server/index.ts (caller-ID lookup, role-based prompts, smart-home permission routing) and an admin UI at apps/control/public/spaces/admin/voice.html. When Phase 8 lands, the right move is to extend Vapi — provision a Sponic-owned Vapi account, reuse the assistant-request handler, plug in the same persona — not rebuild the phone path on LiveKit. Estimated savings: ~5 days vs ~10 days. Update the dev task doc before Phase 8 starts.