B Babelio Playbook 12 / 24
Refresh: quarterly EN · RU
Operating System · Artifact 12

Q1 OKRs

The first operating quarter has exactly one job: find out if the wedge is real and ship a dual-track MVP a real immersion learner keeps running. Four objectives, twelve outcome-KRs, every one with a baseline, a target, a single owner and a checkpoint date. Pre-revenue, pre-PMF — so these are validation OKRs, not growth OKRs.

Objectives
4
Key results
12
North Star
≥4sess/wk
Quarter
Q1first

How this quarter is scored

Quarter clock: kickoff Jun 1, mid-quarter check Jul 15, close Aug 31 (~13 weeks; maps 1:1 to the six phases in 17-90-day-plan.html). Score each KR 0.0–1.0 at close; 0.7 is "hit" for a validation quarter, not 1.0 — these are learning bets, and a clean kill is a win, not a miss. The whole quarter rests on one dependency: the audio engineer (Objective 3). If that hire isn't committed by the Jun 20 checkpoint, Objectives 2 and 4 cannot complete and the quarter is re-planned around the bootstrap path (10-financial-model.html).

Objective 1 — Validate the wedge

1
Customer truth Owner: Founder/CEO
Prove that serious immersion learners feel the live / native-content pain badly enough to pay — or kill the wedge cleanly.
Key result (outcome) Baseline Target Owner Checkpoint
1.1 % of interviewed learners who recall a specific painful live / native-content incident in the last 30 days (Mom Test verbatim). 0% (0 interviews) ≥30% / 15–20 Founder Jun 20
1.2 Paying / LOI users from the concierge metered test who pay a rate that clears subtitle COGS (~$0.31/active-hr). 0 ≥5 Founder Jul 15
1.3 WTP ceiling vs dub-COGS gap resolved with real data: Van Westendorp + metered $5/hr test run on paying ICP buyers. unvalidated 30–50 buyers Founder Aug 15
Why this objective, over the alternatives

We picked "validate the wedge" over "start growth / chase installs" because REVIEW.md scores the whole research analytically fundable but real-world unfundable today on exactly one gap: zero user evidence (1 [PAST] datum, 0 interviews). Every CAC, loop, and retention number in the playbook is currently a [HYP]. Spending the quarter on acquisition would compound that bet on top of an unvalidated foundation. The single most important number this quarter is not MRR — it's whether ≥30% of real learners recall a specific painful incident; if they don't, the kill-criterion fires and we save a year. We deliberately framed KRs as outcomes (recall %, paying users, gap resolved) — not "do 20 interviews," which would be an activity that can be 100% complete and still teach us nothing.

Kill-criterion (this is a win, not a miss) If KR 1.1 lands <30% recall or nobody in KR 1.2 pays a rate clearing COGS — stop, do not ship the MVP. Re-segment (clipper/creator hedge) or pivot to the B2B/SDK trajectory before burning the audio-engineer runway. A clean kill at the Jul 15 checkpoint scores the objective 1.0.

Objective 2 — Ship the dual-track MVP

2
Product · the thing extensions can't do Owner: Audio/OS engineer
Get a real immersion learner watching native-desktop audio with live captions + whisper-under-original, passing the eval gate, on one OS.
Key result (outcome) Baseline Target Owner Checkpoint
2.1 Glass-to-glass p95 latency on real lecture audio (whisper-dub layer) — the launch "done" gate. prototype, unmeasured <700ms Audio eng Aug 15
2.2 Native per-process audio capture verified working on real native clients per target OS (e.g. desktop Webex, an LMS player, VLC). 0 clients ≥3 Audio eng Jul 15
2.3 babelio-evals gold set live and gating deploys: MT adequacy on the 20-example set, with capture/eval telemetry logging from session 1. 0/5 (no harness) ≥4/5 Audio eng Jul 15
Why this objective, over the alternatives

We picked "narrowest native-desktop capture on ONE OS" over "polish a broad multi-OS app" because product.md is explicit: the native per-process audio capture is the one thing browser extensions structurally cannot do — and it's also the hardest, riskiest part of the build (Wk 3–6 = most of the risk). Shipping breadth before that primitive is proven would burn the quarter on the easy 80% while the load-bearing 20% stays unvalidated. We made dual-track the default (captions + whisper-under-original), not auto-mute dub, because the resolved ICP — immersion learners — needs the original voice preserved; muting destroys their learning job (the P0 ICP/mode collision REVIEW.md flagged). And we gate KR 2.1/2.3 on the babelio-evals harness because product.md rule 2 is "evals before features" — without the gate, latency and hallucination drift silently and we'd ship a vitamin.

Hard dependency This entire objective cannot start without Objective 3's hire on payroll. The 10-week MVP timeline in product.md §3 is explicitly contingent on a committed Rust + CoreAudio/WASAPI engineer. No engineer → re-plan to bootstrap (subtitle-only, slower) and move 2.1–2.3 to Q2.

Objective 3 — Close the load-bearing hire

3
Team · the single dependency load-bearing Owner: Founder/CEO
Commit a Rust + CoreAudio/WASAPI engineer to payroll before any MVP work begins — or the whole quarter is fiction.
Key result (outcome) Baseline Target Owner Checkpoint
3.1 Audio/OS-internals engineer signed and on payroll (or committed co-founder), able to ship native capture across both OSes. 0/1 (unconfirmed) 1/1 signed Founder Jun 20
3.2 Founder-market-fit gap in BRIEF.md closed: technical background recorded and the OS-internals plan documented for investors. absent (Team 0/50) documented Founder Jun 20
3.3 Native capture spike de-risks the riskiest primitive: one real native client's per-process audio captured end-to-end within the trial period. 0 clients captured 1 client Audio eng Jul 1
Why this objective, over the alternatives

We made hiring a standalone objective rather than a line buried in Product because REVIEW.md scores Team 0/50 ("no founder-market fit = no meeting") and names this exact hire as the P0 dependency the entire moat and 10-week timeline rest on. 10-financial-model.html calls it the load-bearing cost — ~70% of R&D spend through Q1–Q3. The honest sequencing is brutal: this objective gates Objectives 2 and 4 entirely, so its checkpoint (Jun 20) is the earliest and the most consequential of the quarter. We chose KR 3.3 — an actual capture spike inside the trial — over a softer "interview N candidates," because the failure mode here isn't finding someone, it's discovering after onboarding that the OS-internals work is harder than scoped. A paid trial spike surfaces that in week 3, not month 3.

If 3.1 misses, the quarter changes shape — do not pretend otherwise

No committed engineer by Jun 20 → Objectives 2 and 4 are formally de-scoped to Q2, and the quarter re-plans around the bootstrap subtitle-only path in 10-financial-model.html. The worst outcome is leaving Objective 2's KRs on the board, scoring them red in August, and learning nothing — that's a hiring failure dressed as an execution failure.

Objective 4 — Reach the North Star with real users

4
Activation & habit north star Owner: Founder/CEO
Get a hand-onboarded cohort to ≥4 native-desktop sessions/user/week and take the first honest PMF reading.
Key result (outcome) Baseline Target Owner Checkpoint
4.1 Hand-onboarded wedge users sustaining the North Star: ≥4 native-desktop sessions/user/week for 2 consecutive weeks. 0 users ≥15 Founder Aug 31
4.2 Activation rate — % of installs reaching the activation event (first exported/shared sentence-mining card) by D7. unmeasured ≥45% Founder Aug 15
4.3 First Sean Ellis PMF reading administered at day-14 across the cohort — measured, not claimed. 0 hands, no reading ≥30 hands surveyed Founder Aug 31
Why this objective, over the alternatives

We picked "North Star sessions + a real PMF reading" over "loop K / viral growth" because the honest sequence is habit-then-distribution: product.md's North Star is ≥4 native-desktop sessions/user/week ("did they trust it enough to leave it running over a real lecture") — MAU is vanity. The loop factor K and channel-test CAC matter, but they're a Q2 bet once we know the product is sticky; chasing virality before a 15-user cohort sustains the North Star would be optimizing a funnel into a leaky bucket. We deliberately set KR 4.3 as "administer the survey to ≥30 hands," not "hit 40% very-disappointed" — because REVIEW.md flags PMF as PENDING and forbids claiming it until measured. The outcome we own this quarter is getting an honest reading on enough hands; the 40% threshold is a finding we report, not a target we set, so we don't pressure ourselves into faking PMF.

OKR anti-patterns · the check we ran before locking these

Run this checklist before adopting any OKR set — ours or next quarter's. If a KR trips one of these, it goes back to the drafting table.

KRs are activities, not outcomes.
"Ship the MVP" / "do 20 interviews" can be 100% complete and teach nothing. Our fix: every KR is a measured outcome — latency <700ms, ≥30% recall, ≥4 sessions/wk — not a task done.
Too many objectives.
A pre-PMF team chasing 8 objectives achieves none. Our fix: exactly 4, all serving one theme — validate the wedge and ship the thing that validates it. Growth/loop OKRs are explicitly deferred to Q2.
No single owner per KR.
"The team owns it" means nobody owns it. Our fix: every KR names one person (Founder or Audio eng) — and at this size that's the honest map of who actually does the work.
No baseline — you can't tell if you moved.
A target with no starting number is a wish. Our fix: every KR carries today's baseline (mostly 0 / unmeasured — honest for a prototype) next to its target.
No mid-quarter checkpoint.
Discovering a miss at quarter-close is too late to course-correct. Our fix: staggered checkpoint dates (Jun 20 → Aug 31), with the load-bearing hire checked first so a miss re-plans the quarter early.
Sandbagging or moon-shotting the targets.
Targets you'll definitely hit teach nothing; ones you can't hit demoralize. Our fix: targets are pulled from research benchmarks (45% activation, <700ms, ≥30% recall), and 0.7 is "hit" — leaving honest stretch.
Vanity metrics as KRs.
Installs, signups, MAU look like progress and hide a leaky bucket. Our fix: the North Star is sustained sessions and a measured PMF reading, not download counts — habit before headcount.
OKRs disconnected from the strategic bet.
A set of "nice things to improve" with no through-line. Our fix: all 4 objectives ladder to the one Q1 bet from the strategy memo — validate the wedge, ship the dual-track MVP — and each rationale defends the choice against a named alternative.
Cross-references
  • 00-strategy-memo.htmlThe Q1 strategic bet these four objectives ladder up to (validate wedge + ship dual-track MVP).
  • 17-90-day-plan.htmlWeek-by-week execution of these KRs across six 15-day phases; checkpoint dates align.
  • 14-kpi-dashboard.htmlWhere these KR metrics (North Star sessions, activation %, PMF, latency) are tracked weekly.
  • 11-hiring-plan.htmlThe audio-engineer scorecard behind Objective 3 — mission, outcomes, comp range.
  • 10-financial-model.htmlThe load-bearing cost and the bootstrap re-plan if Objective 3 misses.
  • 16-risk-register.htmlThe hire dependency, churn, and WTP-vs-COGS gap as tracked P0/P1 risks with tripwires.
Babelio · Q1 OKRs · Playbook 12/24 Validation quarter · 0.7 = hit · refresh quarterly