Strategy Memo

Babelio is the desktop audio layer immersion learners can't get anywhere else.

We win one tightly-walled niche — serious immersion-method language learners — by translating native desktop apps that browser extensions structurally cannot reach, while keeping the original audio for ear-training.

Bar for good A new VP reads this in five minutes and can speak coherently about Babelio's strategy in their first meeting.

North Star

≥4 sess/wk

Wedge market

$13M SAM

ICP

Immersion learners

GTM motion

PLG · community

Situation The latency floor opened the door — but for everyone.

In 2025 three latency curves crossed: streaming STT under 300ms, LLM machine translation under 200ms, and streaming TTS under 200ms — summing below the ~700ms threshold where dubbing reads as live interpretation, not a delayed echo. Real-time speech-to-speech translation is now a real category (~$481.6M in 2025, ~9.5% CAGR), riding the broader AI-translation market ($3.68B in 2026, 25.2% CAGR). But that floor dropped for everyone on the same Deepgram, ElevenLabs and Cartesia APIs. Zoom shipped a native AI Voice Translator in April 2026; DeepL Voice runs inside Teams; DubTab and Whisperr already dub any browser tab at ~0.5s. Model quality is not a moat — it is table stakes.

Complication A broad consumer pitch is a race to $0 against free.

"Translate any app" sounds like a $720M consumer market, but that ceiling is exactly the contested zone where Zoom's free feature and free browser extensions already live — and "anyone watching foreign video" has no trigger, no community, and zero willingness to pay next to free captions. Marketing to everyone is marketing to no one. The painful tension is timing: the OS-level audio-capture unlock (macOS 14 CoreAudio process taps, Windows 11 audio-session APIs) is open right now, but it is a head-start, not a moat — it uses public APIs any funded team can also call. The window to convert that head-start into a compounding data advantage is short.

Resolution Lead with the one segment browser tools cannot reach.

We anchor on serious immersion-method language learners (Japanese/Korean/Chinese-first) who watch native foreign-language video and live streams on native desktop clients — MPV/VLC, native streaming apps, desktop games — for hours a day. Their core job is "understand live native speech while keeping the original audio for ear-training," so our hero mode is subtitle / dual-track (whisper-dub under the original), with auto-mute-and-dub as an optional toggle, not the default. They already pay $60–180/yr for Migaku, Language Reactor and Anki add-ons, so willingness to pay exists at $12/mo; they cluster in dense, named, calendar-timed communities (Refold, TheMoeWay, Migaku, r/LearnJapanese ~700k). This is a deliberately narrow wedge — ~$13M SAM inside a ~$72M learner TAM — chosen because it is structurally walled off from browser-only rivals.

Why us only A cross-vendor OS layer cannibalizes no incumbent's seat revenue.

Zoom, Teams and Google will ship in-product translation because it deepens their seat revenue — but an OS-wide, cross-vendor layer cannibalizes nobody's core, so no platform owner is incentivized to build it. The open lane survives precisely because it sits between the walled gardens, and a native-desktop capture client is the only thing that reaches inside all of them at once.

North Star & how we measure Four-plus native-desktop sessions, per user, per week.

North Star = ≥4 native-desktop translation sessions per active user per week. We pick sessions-on-native-clients (not minutes, not signups) because it is the single number that proves both engagement and that the value lands where browser tools can't follow — every counted session is one a competitor structurally cannot serve. We instrument the per-app capture and eval telemetry from day one, so the same metric that tracks habit also feeds the only durable moat candidate.

Open risk we are validating, not hiding WTP ceiling-below-floor. The learner willingness-to-pay band ($6–12/mo) sits at or below the COGS-driven price floor for heavy daily users. The entire customer model is currently a hypothesis — zero completed interviews. The Week-1 interview run (15–20 learners, Mom Test script verbatim, <30% kill-criterion) and a Week-2 metered concierge test are the headline thing to prove before locking tiers.

Top 3 strategic bets · next 12 months Three bets, three leading indicators.

Prove the subtitle/dual-track wedge with real learners.

Run the Week-1 Mom Test interviews, ship a thin native-desktop subtitle client to the immersion communities, and land paying users at the North Star. Resolve the WTP gap with a metered concierge test before locking pricing.

Leading indicator≥5 paying or 10 LOIs + 15 users at ≥4 sessions/wk

Turn the native-desktop head-start into a telemetry flywheel.

Instrument per-app capture + per-app translation-quality evals from day one. The audio tap is a head-start on public APIs; the only compounding asset is which apps work, where translation fails, and how to fix it faster than anyone copying the tap.

Leading indicator100% sessions emit per-app quality telemetry

Own the immersion-community channel before paid CAC matters.

Seed Refold/TheMoeWay/Migaku Discords and r/LearnJapanese, sponsor polyglot YouTubers, and time launches to recurring immersion-challenge kickoffs. A legally-clean shareable artifact (the user's own bilingual clips) carries the loop.

Leading indicator≥1,400 community-sourced trials/mo at ≥6% paid conv.

The $10B option — held, not abandoned

If the consumer wedge proves the capture primitive and telemetry compounds, the durable scale path is a B2B / SDK platform layer (accessibility-compliance, embedded real-time translation) — the explicit $10B headline thesis. We lead consumer to earn the moat, then expand; we do not open with it.

See also 01 ICP brief for the full persona & target accounts · 02 value prop for the $12/mo payback math · 03 positioning for the crowded 2×2 · 04 TAM/SAM/SOM for the $72M / $13M / $720M build.

Babelio is the desktop audio layer immersion learners can't get anywhere else.Babelio — это аудиослой для десктопа, которого иммерсивным ученикам больше негде взять.

SituationСитуация The latency floor opened the door — but for everyone.Порог задержки открыл дверь — но для всех сразу.

ComplicationОсложнение A broad consumer pitch is a race to $0 against free.Широкий потребительский питч — это гонка к нулю против бесплатного.

ResolutionРешение Lead with the one segment browser tools cannot reach.Заходим с одним сегментом, до которого браузерные инструменты не дотягиваются.

Why us onlyПочему только мы A cross-vendor OS layer cannibalizes no incumbent's seat revenue.Кросс-вендорный слой ОС не каннибализирует выручку ни одного инкумбента.

North Star & how we measureNorth Star и как измеряем Four-plus native-desktop sessions, per user, per week.Четыре с лишним нативно-десктопные сессии на пользователя в неделю.

Top 3 strategic bets · next 12 monthsТоп-3 стратегические ставки · 12 месяцев Three bets, three leading indicators.Три ставки, три опережающих индикатора.

Prove the subtitle/dual-track wedge with real learners.Доказать клин субтитров/двойной дорожки на реальных учениках.

Turn the native-desktop head-start into a telemetry flywheel.Превратить фору на нативном десктопе в маховик телеметрии.

Own the immersion-community channel before paid CAC matters.Захватить канал иммерсивных сообществ до того, как платный CAC станет важен.