Reference · Live Translation

Live Translation Subtitles

Real-time speech-to-text and translation system that lets guests follow conversations and events in their own language on their phone. A venue microphone captures speech, a server transcribes and translates, and each guest's device displays rolling subtitles in their chosen language.

Prepared 2026-05-05 · Sponic Gardens · Ported from the Alpaca Playhouse spec, adapted for Sponic infrastructure

Contents

Use cases
Architecture
Components
Guest UI — subtitle view
Mobile app (Android)
Network & connectivity
Cost analysis
Implementation phases
Dependencies & API keys
Risk & mitigation

Use cases

Multilingual events — speaker in English, subtitles in Polish/Spanish/etc. for members and guests
Venue tours and house rules walkthrough — new members get real-time translation on their phones
Casual conversation — members who don't share a language can follow along
Workshops and classes — cooking, gardening, or wellness sessions with international participants

Target languages: English, Polish, Spanish, French, German, Portuguese, Italian, Hindi, Arabic

Architecture

┌─────────────────┐
│  Venue Mic       │  USB mic, Jabra Speak, or conference mic
└────────┬────────┘
         │ audio stream (WebSocket or chunked PCM)
         ▼
┌─────────────────────────────────────────────────┐
│   ALPUCA — primary (M4 Mac Mini, Poland venue)   │
│   M4 Pro 10-core · Neural Engine · on-prem       │
│                                                   │
│  ┌───────────┐   ┌──────────────────────┐         │
│  │  STT      │──▶│  Translation Fan-Out │         │
│  │ Deepgram  │   │  DeepL / Helsinki-NLP│         │
│  │ EU / local│   │  (per requested lang) │         │
│  └───────────┘   └──────────┬───────────┘         │
│                             │                     │
│  ┌──────────────────────────▼────────────┐        │
│  │  WebSocket Server                     │        │
│  │  /ws/subtitles?lang=pl                │        │
│  │  /ws/subtitles?lang=en (transcript)   │        │
│  └───────────────────────────────────────┘        │
└─────────────────────────────────────────────────┘
         │ WebSocket push (~1 Kbps per client — text only)
         ▼
┌─────────────────────────┐
│  Guest Phones / Tablets  │
│  (Web app or native)     │
│                          │
│  ┌────────────────────┐  │
│  │ Language picker     │  │
│  │ Rolling subtitles   │  │
│  │ Font size control   │  │
│  └────────────────────┘  │
└─────────────────────────┘

Fallback: Oracle Phoenix (Arizona) when ALPUCA is unavailable

Venue location: Poland — ALPUCA is the primary server. Realistic scale: up to 8 mics, max 80 subtitle clients. Total bandwidth: ~0.6 Mbps worst-case (8 Opus audio streams in + 80 subtitle clients out). Any venue internet connection handles this trivially. ALPUCA in Poland has ~300 ms less round-trip latency than Oracle Phoenix (Arizona). Oracle is the fallback when ALPUCA is unavailable.

Scenario 1 — LAN (venue WiFi only, no internet required)

Guests connect to ALPUCA directly on the venue's local network. Oracle is not in the data path.

Attribute	ALPUCA (Poland, on-prem)	Oracle Phoenix (Arizona)
Feasibility	Yes — on-prem	Not reachable on LAN
Guest network req.	Venue WiFi only	Requires internet (unavailable)
WebSocket latency	<5 ms	N/A
STT — local Whisper	~1–2 s / chunk (M4 Neural Engine)	N/A
STT — Deepgram EU	~30–50 ms (if internet is also available)	N/A
Translation — DeepL	~15–25 ms to Germany (if internet available)	N/A
Translation — local Helsinki-NLP	Free, instant (Neural Engine)	N/A
End-to-end (cloud STT)	~1.8 s	N/A
End-to-end (local Whisper)	~2–3 s	N/A
Cloud API cost / hr	$0.28 (cloud mode) or $0 (fully local)	N/A
Server cost	~$5–8/mo electricity	N/A
Privacy	Audio never leaves the building (local mode)	N/A
Verdict	ALPUCA wins by default — Oracle is unreachable without internet.

Scenario 2 — Poland, internet-connected guests

Guests have internet access and connect over the public web. ALPUCA sits at the venue in Poland. Oracle Phoenix is in Arizona (~150 ms RTT from Poland).

The transatlantic problem: Oracle's datacenter bandwidth advantage assumed it was geographically close. From Poland, every packet — audio chunks in, subtitle pushes out — makes a ~300 ms round trip through the US. ALPUCA in Poland eliminates that entirely.

Attribute	ALPUCA (Poland)	Oracle Phoenix (Arizona)	Winner
Server location	Poland (co-located with venue)	Phoenix, AZ — ~9,000 km from Warsaw	ALPUCA
RTT: venue mic → server	~5–30 ms (same building or Polish ISP)	~150 ms (transatlantic)	ALPUCA
RTT: server → Deepgram EU	~30–50 ms (Poland → Frankfurt)	~120–160 ms (Arizona → Frankfurt)	ALPUCA
RTT: server → DeepL (Germany)	~15–25 ms	~110–140 ms	ALPUCA
RTT: server → guest phones (Poland)	~10–30 ms (Polish ISP)	~150 ms (back across Atlantic)	ALPUCA
Total transatlantic overhead	0 ms	~300 ms (150 ms × 2)	ALPUCA
End-to-end latency (cloud STT + translation)	~1.8–2.2 s	~2.1–2.5 s	ALPUCA
End-to-end latency (local Whisper)	~2–3 s (Neural Engine)	~5–6 s (CPU-only) + 300 ms overhead	ALPUCA
Cloud API cost / hr (Deepgram + DeepL)	$0.13 + $0.15 = $0.28	$0.13 + $0.15 = $0.28 — same, API pricing is per usage not location	Tie
Server cost	~$5–8/mo electricity (~7 W idle, ~30 W load)	$0 (Always Free)	Oracle (marginal)
Subtitle bandwidth per client	~300-byte JSON message every 2–3 s = ~1 Kbps per guest — text, not video		Tie
Realistic scale (next 3 months)	Max 8 mics (not all simultaneous) · max 80 subtitle clients		Irrelevant to server choice
Total bandwidth at full scale	~0.6 Mbps worst-case (8 × 64 Kbps Opus in + 80 × 1 Kbps subtitles out)	Same math	Not a factor — Starlink Mini provides 10–15 Mbps up (17–25× headroom)
Venue internet	Starlink Mini Roaming — portable, brought to every event; ~20–40 ms latency to EU ground stations; no dependency on venue WiFi	Requires venue/venue internet or Tailscale tunnel	ALPUCA
Uptime / reliability	Venue power + Polish ISP (single point of failure)	Datacenter UPS, auto-restart, ~99.9%	Oracle
Hardware / ML compute	M4 Pro, Neural Engine — fast local inference	4× Ampere ARM, CPU-only, no GPU	ALPUCA
Privacy / GDPR	Audio stays in EU (Deepgram EU endpoint, DeepL Germany)	Audio routed through US infrastructure	ALPUCA
Deployment	launchd (macOS)	systemd (existing flow, prompt-runner already running)	Oracle (slightly easier)
Verdict	ALPUCA is the primary for all Poland events. Oracle is the fallback when ALPUCA is unavailable — its only genuine advantages (reliability, $0 cost) don't outweigh the 300 ms latency penalty and GDPR routing.

Decision guide

Situation	Use	STT mode	Reason
LAN-only event (no internet)	ALPUCA	Local Whisper + Helsinki-NLP	Only option; Neural Engine fast; audio never leaves building; $0 API cost
Poland venue, internet via Starlink Mini — any guest count up to 80	ALPUCA	Deepgram EU + DeepL	300 ms faster; Starlink provides 10–15 Mbps up (17–25× what's needed); guests on same Starlink WiFi hit ALPUCA at LAN speed
ALPUCA down or Starlink unavailable	Oracle Phoenix	Deepgram + DeepL	Always-on fallback; +300 ms latency is better than no service

Components

1. Mic input capture

Up to 8 mics, not all speaking simultaneously. Each active mic is one independent audio stream to Deepgram (~64 Kbps Opus each). Total inbound at 8 simultaneous: ~512 Kbps — well within Starlink Mini's upload.

Option	Price	Notes
Jabra Speak 510	~$100	Conference speakerphone, USB, good pickup radius — good for table discussions
Blue Yeti	~$100	Studio quality, USB, better for single-speaker or presenter
Rode Wireless GO II	~$250	Clip-on wireless, best for mobile speaker; up to 2 per receiver
Existing USB mic	$0	Test with whatever's available first

Multi-mic setup: Deepgram supports multichannel audio — each mic gets its own channel, enabling per-speaker transcription and optional speaker labels. Run one Deepgram WebSocket connection per active mic, or use a multichannel USB audio interface to send all channels in one stream.

Software: PyAudio or arecord capturing 16kHz mono PCM per channel, streamed to ALPUCA via local pipe or WebSocket.

2. Speech-to-text (STT) engine

Option A — Self-hosted on ALPUCA (free, ~3s latency):

Whisper.cpp with medium or large-v3 model
M4 Mac Mini handles real-time transcription easily
~1.5 GB RAM for medium, ~3 GB for large
Chunked processing: 5-second audio windows with 1-second overlap
Output: timestamped text segments

Option B — Cloud STT (cheap, ~1.5s latency):

Deepgram Nova-2 streaming API ($0.0043/min)
Native WebSocket streaming — no chunking needed
Better accuracy, especially for accented speech
Polish language support included

Recommendation: Start with Deepgram for quality/latency, fall back to local Whisper if cost matters at scale.

3. Translation engine

Option A — Cloud translation (best quality):

Service	Polish support	Cost per 1M chars	Notes
DeepL API	Excellent	$5.49 (Pro)	Best Polish quality
Google Translate	Good	$20	Widest language coverage
Azure Translator	Good	$10	Free 2M chars/month (covered by Founders Hub credit)

Option B — Self-hosted on ALPUCA (free):

Helsinki-NLP/opus-mt models (one per language pair)
~500 MB per model, fast inference on M4
Quality: good for common phrases, weaker on idioms
Run via Ollama or direct HuggingFace transformers

Translation caching: Cache translated segments by source hash + target language. If 3 guests want Polish, translate once, broadcast to all. Real-time speech is unique, so the cache mainly helps with repeated phrases and greetings.

Recommendation: DeepL for Polish (their standout language), Azure free tier for others. Azure Translator falls under the Founders Hub $1k credit.

4. WebSocket server

Tech: Node.js with ws library (matches existing edge function patterns) or Python FastAPI with websockets.

Endpoints:

# Primary (Oracle Phoenix — datacenter)
wss://subtitles.sponicgardens.com/subtitles?lang=en    # Original transcript
wss://subtitles.sponicgardens.com/subtitles?lang=pl    # Polish translation
wss://subtitles.sponicgardens.com/subtitles?lang=es    # Spanish translation

# Fallback (ALPUCA — LAN only)
ws://alpuca.local:8910/subtitles?lang=en
ws://alpuca.local:8910/subtitles?lang=pl

Message format:

{
  "id": "seg_001",
  "text": "Witamy w Sponic Gardens",
  "lang": "pl",
  "source_lang": "en",
  "source_text": "Welcome to Sponic Gardens",
  "timestamp": 1711800000,
  "is_partial": false
}

is_partial: true for interim results (updates in-place on client)
is_partial: false for finalized segments (appended to history)

Connection management:

Track connected clients per language
Only translate to languages with active listeners
Heartbeat ping every 30s, auto-reconnect on client side

Guest UI — subtitle view

New screen accessible via the Sponic Gardens web app (progressive web app) and native apps (iOS + Android).

UI layout

┌──────────────────────────────┐
│  Live Subtitles    [EN ▼] A  │  ← language picker + font size
│──────────────────────────────│
│                              │
│  Welcome to Sponic Gardens   │
│                              │
│  The WiFi password is on the │
│  card in your room           │
│                              │
│  ░░░░░░░░░░░░░░░            │  ← partial/incoming text (dimmed)
│                              │
│                              │
└──────────────────────────────┘

Features

Language picker: Dropdown at top, defaults to phone locale
Font size: A button cycles Small → Medium → Large → Extra Large
Auto-scroll: New text appears at bottom, auto-scrolls (pause on manual scroll-up)
Partial results: Show interim STT in gray, replace with final in white
Dark mode: Dark background by default (easier to read in event settings)
History: Keep last 50 segments, scrollable
Connection status: live reconnecting disconnected

Native app integration

Platform	Files	WebSocket lib
iOS (SwiftUI)	`Views/SubtitleView.swift`, `Services/SubtitleService.swift`, `Models/SubtitleSegment.swift`	`URLSessionWebSocketTask` (native, no deps)
Android (Compose)	`ui/subtitles/SubtitleScreen.kt`, `services/SubtitleService.kt`, `models/SubtitleSegment.kt`	OkHttp WebSocket (already in dependency tree)

Reconnect with exponential backoff (1s, 2s, 4s, max 30s). Entry point: new tab/button on the app home screen — only visible when the subtitle server is broadcasting (check via GET /subtitles/status).

Mobile app (Android)

Native Kotlin + Jetpack Compose app (com.sponicgardens.sponic). Connects to the ALPUCA subtitle backend via OkHttp WebSocket. Built on ALPUCA, published to Cloudflare R2, registered in Supabase mobile_builds.

Current version

Attribute	Value
Version	0.4.4 (build 8)
Package	`com.sponicgardens.sponic`
Min SDK	26 (Android 8.0)
Target SDK	35
ABI	arm64-v8a only
Download (latest)	sponic-debug.apk
Build host	ALPUCA (M4 Mac Mini, Poland)

Features

Google Sign-In — Credential Manager API with GCP Sponica OAuth; falls back to manual name entry
View Subtitles — WebSocket connection to ALPUCA, real-time translated subtitles with original text shown below
Speak + View — dual-path capture: Android SpeechRecognizer for live text injection + AudioRecord/MediaCodec for local M4A recording simultaneously
Settings screen — language picker (9 languages, defaults from device locale), microphone device selector, input gain slider, sample rate (16/44.1/48 kHz), noise suppression, local recording toggle, cloud sync toggle
Recordings — browse, play, share, and delete local M4A recordings with JSON sidecar metadata
Audio device management — hot-swap detection via AudioDeviceCallback; auto-prefer USB > Wired > Bluetooth > Built-in; deduplicated built-in mics

Audio recording architecture

The app uses a dual-path approach to capture audio while maintaining SpeechRecognizer access:

┌──────────────────────────────────┐
│  AudioRecord (VOICE_RECOGNITION) │  ← coexists with SpeechRecognizer
│  PCM 16-bit → gain applied       │
│       │                          │
│       ├─> MediaCodec (AAC-LC)    │  → .m4a file on device
│       │   └─> MediaMuxer (MP4)   │
│       │                          │
│       └─> RMS level (VU meter)   │  → animated bar in UI
│                                  │
│  SpeechRecognizer (parallel)     │  → POST /subtitles/inject
└──────────────────────────────────┘

AudioSource: VOICE_RECOGNITION — chosen specifically because it coexists with SpeechRecognizer on the same device (unlike MIC or CAMCORDER)
Encoder flush: uses a @Volatile keepRecording flag + join() with 2s timeout instead of coroutine cancellation, ensuring EOS is sent and the encoder drains completely
Metadata sidecar: each recording writes a JSON file with gain, sample rate, channels, device info, noise suppression state, duration, file size
Minimum file size: recordings under 1 KB are auto-discarded (empty/corrupt encoder output)

Key files

File	Purpose
`MainActivity.kt`	Single-activity entry, screen nav, SharedPreferences, AudioDeviceManager lifecycle
`ui/screens/LoginScreen.kt`	Google Sign-In (Credential Manager API) + manual name entry fallback
`ui/screens/HomeScreen.kt`	Mode selection (View Subtitles, Speak+View), server status, profile menu
`ui/screens/SubtitleScreen.kt`	Real-time subtitle display + mic controls with VU meter and recording timer
`ui/screens/SettingsScreen.kt`	Language, mic device, gain, sample rate, noise suppression, recording & sync toggles
`ui/screens/RecordingsScreen.kt`	Browse/play/share/delete local M4A recordings
`audio/AudioRecorder.kt`	Dual-path audio engine: AudioRecord → MediaCodec/MediaMuxer + RMS metering
`audio/AudioDeviceManager.kt`	Device enumeration, hot-swap callbacks, auto-selection priority, deduplication
`network/SubtitleClient.kt`	OkHttp WebSocket + HTTP client for subtitle backend
`model/Models.kt`	Data classes: Language, AppMode, AudioSettings, AudioDevice, Recording, ServerStatus

Build & publish

# One-step build + publish from M4:
cd apps/mobile
./publish.sh "changelog description"

# Publishes to:
#   R2:  mobile/sponic-{version}-debug.apk  (versioned)
#   R2:  mobile/sponic-debug.apk            (latest)
#   Supabase: mobile_builds row with metadata

publish.sh handles everything: SSHs into ALPUCA for the Gradle build, copies the APK back, uploads versioned + latest to R2 (sponic-images bucket), and inserts a build record into Supabase. Credentials are fetched from Bitwarden at runtime.

Supported languages

Language	Code
English	`en`
Polish	`pl`
Spanish	`es`
French	`fr`
German	`de`
Portuguese	`pt`
Italian	`it`
Hindi	`hi`
Arabic	`ar`

Language defaults from the device's system locale at first launch. Users can change it anytime in Settings, and the preference persists across sessions.

Planned (not yet implemented)

R2 recordings bucket — dedicated sponic-recordings bucket for audio uploads, with Supabase metadata table for recording catalog
Auto-sync on WiFi — background upload of recordings when cloud sync is enabled and device is on WiFi
Server-controlled auto-gain — server analyzes uploaded recordings and suggests optimal gain level per mic/venue
iOS app — Swift/SwiftUI native (OAuth credentials ready in GCP, app not yet built)

Network & connectivity

Primary: ALPUCA (Poland venue, Starlink Mini)

Venue internet: Starlink Mini Roaming — portable, brought to every event.

Starlink Mini spec	Value	Our need
Upload	~10–15 Mbps typical	~0.6 Mbps worst-case (8 mics + 80 clients) — 17–25× headroom
Download	~50–100 Mbps	Negligible (server receives only mic audio)
Latency (EU ground stations)	~20–40 ms	Adds ~20–40 ms to Deepgram EU and DeepL calls
Portability	Roaming across EU/Poland	Works at any venue — no dependency on venue WiFi

Guests on Starlink WiFi: connect to ALPUCA as a LAN device — WebSocket latency <5 ms, same as pure LAN mode
Deepgram EU (Frankfurt): ~25–40 ms via Starlink EU ground stations
DeepL (Germany): ~20–35 ms via Starlink
Full end-to-end subtitle latency (cloud STT + translation): ~1.8–2.0 s

Fallback: Oracle Phoenix (Arizona)

Guests connect to wss://subtitles.sponicgardens.com via Cloudflare Workers proxy or directly to 144.24.51.48:8910
Use when ALPUCA is down (power failure, ISP outage, hardware issue)
Adds ~300 ms transatlantic overhead but is always-on with datacenter reliability

SSE fallback: If WebSocket is blocked by a guest's network or mobile carrier, fall back to SSE (Server-Sent Events) over HTTPS. Implement on both ALPUCA and Oracle servers.

Cost analysis

Per-hour cost (50% speaking time = 30 min audio, ~27K chars)

Setup	STT	Translation	Total / hr
All cloud	Deepgram: $0.13	DeepL: $0.15	$0.28
Hybrid (cloud STT, local translate)	$0.13	$0	$0.13
All local (ALPUCA)	$0	$0	$0.00

Monthly estimates

Usage	All cloud	Hybrid	All local
2 events/week, 2 hrs each	$4.48	$2.08	$0
Daily use, 4 hrs/day	$33.60	$15.60	$0
Always-on ambient (12 hrs/day)	$100.80	$46.80	$0

One-time costs

Conference mic: $0–250 (may already have one)
No additional server hardware — Oracle Phoenix (Always Free tier) handles primary; ALPUCA M4 available for LAN fallback

Implementation phases

Phase 1: Server MVP (1–2 days)

Python/Node server: mic capture → Deepgram streaming STT → WebSocket broadcast (English transcript only)
Health endpoint: GET /subtitles/status
Browser test client (simple HTML + WebSocket listener)
Deploy on ALPUCA as launchd service (primary); deploy on Oracle Phoenix as systemd service (fallback)

Phase 2: Translation (1 day)

DeepL integration for Polish + Spanish
Helsinki-NLP local models as fallback
Language-based fan-out: only translate to languages with active listeners
Cache layer for repeated phrases

Phase 3: App integration (2–3 days)

iOS SubtitleView + SubtitleService
Android SubtitleScreen + SubtitleService
Language picker with phone locale default
Font size control + auto-scroll
Connection status indicator
Conditional visibility (only show when server is broadcasting)

Phase 4: Polish & production (1–2 days)

Partial result rendering (interim gray text)
Reconnect logic with exponential backoff
Dark mode optimized for event lighting
Cloudflare Workers proxy (wss://subtitles.sponicgardens.com) + Tailscale for admin access
Logging: session duration, languages requested, character counts → Supabase api_usage_log

Phase 5: Enhancements (future)

Speaker diarization (“Speaker 1”, “Speaker 2”)
TTS output: read translations aloud via ElevenLabs/Voxtral (accessibility)
Kiosk/TV display mode (large text, auto-scroll, no controls)
Saved transcripts: export event transcript to PDF/email
Whisper local fallback: auto-switch if Deepgram is down or budget exceeded

Operations — starting ALPUCA server components

Subtitle backend (required)

The subtitle backend runs on ALPUCA at port 8910. It handles WebSocket subtitle broadcasting and speech injection.

# SSH to ALPUCA
ssh [email protected]

# Start in mock mode (no mic, broadcasts test sentences):
cd /Users/alpuca/Documents/codingprojects/alpacapps/live-subtitles
node server.js --mock

# Start in production mode (requires GEMINI_API_KEY for translation):
GEMINI_API_KEY="your-key" node server.js

# Start with Polish source language mock:
node server.js --mock-pl

Verify the backend is running

# From any machine on the network:
curl http://Alpuca.local:8910/subtitles/status

# Expected response:
# {"active":true,"mock":true,"listeners":0,"languages":[],"supported_languages":["en","pl","es","fr","de","pt","it","hi","ar"]}

Endpoints

Endpoint	Method	Purpose
`ws://Alpuca.local:8910/subtitles?lang=XX`	WebSocket	Subscribe to subtitles in language XX
`/subtitles/status`	GET	Health check — active, mock, listener count
`/subtitles/inject`	POST	Inject text from speech recognition: `{ text, is_partial, source_lang, speaker }`
`/subtitles/transcribe`	POST	Send audio for server-side Gemini STT: `{ audio: base64, source_lang }`
`/eventspeaker`	GET	Browser-based speaker/test client

Running as a persistent service (launchd)

A launchd plist exists at ~/Library/LaunchAgents/com.alpacapps.live-subtitles.plist but requires valid TLS certs at /tmp/subtitle-cert.pem and /tmp/subtitle-key.pem (which don't persist across reboots). For now, start the server manually via SSH. To fix auto-start, either generate persistent self-signed certs outside /tmp or remove the TLS env vars from the plist to run HTTP-only.

Android app (Sponic)

The native Kotlin app connects to the ALPUCA backend. Build on ALPUCA:

ssh [email protected]
cd /Users/alpuca/Documents/codingprojects/sponic/apps/mobile
source ~/.zshrc
./gradlew assembleDebug
# APK: app/build/outputs/apk/debug/app-arm64-v8a-debug.apk

Google Sign-In (OAuth 2.0)

The app uses Android Credential Manager API with Google Sign-In. OAuth credentials are in GCP project Sponica (801803827261), owned by [email protected].

Credential	Type	Purpose
Sponic (Web)	Web application	Token audience for `setServerClientId()` — used by Credential Manager API
SponicAndroid	Android	Tied to package `com.sponicgardens.sponic` + debug SHA-1

Stored in Bitwarden: GCP Sponica OAuth Clients (2edd05a6) in DevOps-sponicgarden collection. Console: APIs & Services → Credentials.

The Android client uses the debug keystore SHA-1 (50:0D:86:…:D0:EE). For release builds, generate a new SHA-1 from the release keystore and add it as a separate Android OAuth client in GCP.

Dependencies & API keys

Service	Key location	Free tier
GCP Sponica (OAuth)	Bitwarden: `GCP Sponica OAuth Clients` (`2edd05a6`)	Free (OAuth is free)
Deepgram	Bitwarden (DevOps-sponicgarden, to be created)	$200 free credit on signup
DeepL	Bitwarden (DevOps-sponicgarden, to be created)	500K chars/month
Azure Translator	Bitwarden (DevOps-sponicgarden, to be created)	2M chars/month (Founders Hub credit)
Helsinki-NLP models	Local on ALPUCA	Unlimited (open source)
Whisper.cpp	Local on ALPUCA	Unlimited (open source)

All API keys follow the standard Sponic credential flow: store in Bitwarden under the DevOps-sponicgarden collection (ALPU.CA org), reference by BW item ID in config/project.config.ts, and document the unlock recipe in infra/runbook.md.

Risk & mitigation

Risk	Mitigation
High ambient noise reduces STT accuracy	Use directional mic; Deepgram has noise suppression; test mic placement per venue
Multiple simultaneous speakers	Deepgram supports multichannel; consider lapel mics for structured events
Polish translation quality	DeepL is best-in-class for Polish; test with native speakers at the venue
WiFi congestion during events	WebSocket is lightweight (~1 KB/s per client); not a concern
Oracle Phoenix CPU load	Currently <5% idle with 7 workers; subtitle relay adds negligible load. ALPUCA fallback: Whisper medium uses ~30% of M4, plenty of headroom
Guest adoption	Show QR code on event screens linking to the subtitle page; no app install required for the web version