B Babelio · Operating Playbook 09 · Experiments
refresh: weekly
Artifact 09 — Demand Engine

Experiment Backlog

Fifty growth and validation experiments, ICE-scored and ranked. The top twelve are fully designed — hypothesis, metric, MDE, duration, kill-criterion. The next four weeks are sequenced. Babelio is prototype-only: this backlog is a validation engine first, a growth engine second.

Backlog size
50
Designed
12
Concurrent
1–2
Roadmap
4wk
Source of truth — read first Use this as the source of truth for growth/PM standups. Refresh weekly: re-score after each readout, promote the next experiment, retire the dead ones. ICE numbers and scores are identical across languages — only titles and hypotheses translate.
Bar for good A PM or growth lead opens this page Monday morning and starts the top experiment without asking the founder a single question — recruitment source, metric, MDE, duration and the number that kills it are all on the card.
Scoring. ICE = Impact × Confidence ÷ Ease, each rated 1–10. Dividing by Ease (instead of multiplying) pushes cheap, fast experiments up — correct for a pre-PMF prototype where learning velocity beats polish. Higher score = do sooner. The four green rows are the resolved must-run-first set.

1 · The 50 · ICE-scored

Sorted by ICE descending. = fully designed in §2. Green = the resolved first-wave validation set (interviews, WTP, loop-K, channel test) — these run before any growth spend.

I Impact · C Confidence · E Ease · Score = I×C÷E NOW running · NEXT queued · blank = backlog
# Experiment Channel / Surface ICE Score Owner Status
E01 15–20 immersion-learner interviews (Mom Test, dated-incident probe) Discord/Reddit 109330.0 Founder NOW
E02 Van Westendorp + metered $5/hr concierge WTP test (dub vs subtitle ceiling) Concierge 108326.7 Founder NEXT
E03 Loop-factor K test — sentence-mining card share → install In-product 97415.8 Growth
E04 Calendar-timed immersion-challenge channel test (Refold/MoeWay/Migaku) Community 96413.5 Growth
E05 Install-completion funnel instrumentation (download → launch → permission) In-product 88416.0 Eng
E06 Reverse-trial vs open-freemium activation A/B (7-day card-on-file) In-product 87414.0 Growth
E07 Activation event A/B — "first shared card" vs "first translated minute" In-product 87414.0 Growth
E08 Day-14 Sean Ellis PMF survey on Week-2/3 cohort Email/In-app 97512.6 Founder
E09 Trusted-mod demo seeding vs cold post (install-trust transfer) Community 86412.0 Growth
E10 Long-tail SEO landing pages — "watch [lang] stream with subtitles desktop" SEO/Content 76410.5 Content
E11 Dub-overage meter nudge — suggest right tier before sticker shock In-product 7759.8 Growth
E12 Immersion-YouTuber sponsorship CAC test (loop-fail contingency) Sponsorship 8558.0 Growth
E13 20-sec install Loom on landing + pinned in community threads Content 67314.0 Content
E14 "Mined with Babelio" footer on exported Anki cards (loop attribution) In-product 76314.0 Growth
E15 One-click "subtitle this now" auto-detect on D1 onboarding In-product 87511.2 Eng
E16 Saved language/app presets to cut D7 return friction In-product 7759.8 Eng
E17 D30 immersion-streak digest email ("you mined 42 sentences / 6 hrs") Email 76410.5 Growth
E18 "You understood X% on your own" reframe vs lazy-tool churn driver In-product 76410.5 Growth
E19 Notarized + code-signed build vs unsigned (Gatekeeper drop-off) In-product 88610.7 Eng
E20 "Why these permissions" first-run trust screen A/B In-product 67410.5 Eng
E21 r/LearnJapanese monthly-challenge thread launch post Community 75311.7 Growth
E22 Anki-export deep-link from card share back to install In-product 76410.5 Eng
E23 Heaviest-immersers sub-segment $15–20/mo tolerance probe (WTP Test B) Concierge 8659.6 Founder
E24 Async/batch high-quality mode landing for clipper hedge segment SEO/Content 6547.5 Content
E25 "Never store or upload your audio" trust headline on landing Landing 66312.0 Content
E26 Pricing-page anchor A/B — $12 vs $9 vs $15 Pro Landing 7548.8 Growth
E27 Refold Discord pinned demo by a power-user champion Community 7548.8 Growth
E28 Live-stream subtitle demo clip (own content) for organic shares Content 65310.0 Content
E29 Onboarding latency-perception test (sub-700ms feels live?) In-product 7658.4 Eng
E30 Korean + Chinese language expansion vs Japanese-only acquisition Community 6556.0 Growth
E31 Referral incentive — free dub-minutes for inviting a cohort peer In-product 7557.0 Growth
E32 Windows-installer EV-cert SmartScreen drop-off test In-product 7676.0 Eng
E33 Whisper-dub-under-original vs subtitle-only default preference test In-product 7658.4 Growth
E34 VTuber-clipper Discord async-mode beta drop (secondary segment) Community 5546.3 Growth
E35 Annual-prepay −17% conversion lift vs monthly default Checkout 6657.2 Growth
E36 "Today's immersion" recap auto-generate + one-tap share In-product 7565.8 Eng
E37 Polyglot-YouTuber affiliate-code attribution test Sponsorship 6556.0 Growth
E38 In-app NPS prompt after 5th session for testimonial sourcing In-product 5647.5 Growth
E39 SEO comparison page — "Babelio vs Language Reactor for live streams" SEO/Content 6547.5 Content
E40 Win-back email for D30 churned learners ("new app coverage") Email 5546.3 Growth
E41 Per-app capture/eval telemetry flywheel instrumentation (moat seed) In-product 8676.9 Eng
E42 TikTok/Shorts "watch live anime untranslated" demo (own clips) Social 6446.0 Content
E43 Onboarding checklist gamification (3 cards in 7 days → aha) In-product 6556.0 Growth
E44 Migaku/Anki add-on cross-promo partnership probe Partnership 7464.7 Founder
E45 Pricing meter UI — show live cost-per-session to build trust In-product 5555.0 Eng
E46 Cohort-streak leaderboard inside one immersion Discord Community 5454.0 Growth
E47 Education-org / language-school PQL discovery (B2B trajectory seed) Outbound 7363.5 Founder
E48 SDK/API waitlist landing for accessibility-compliance buyers Landing 6344.5 Founder
E49 Voice-clone same-speaker dub interest probe (lowest ODI — defer) Survey 3452.4 Growth
E50 Paid-social impulse ad test (expected to fail — install friction) Paid social 4352.4 Growth
Read the bottom of the list too E49 (voice-clone) and E50 (paid social) sit last on purpose — voice-clone is the lowest-ODI outcome for this ICP, and paid social fights the trust-heavy install decision. They are documented so the team doesn't re-pitch them as "new ideas" every quarter. E47–E48 are deliberately low-confidence B2B/SDK seeds: not for now, but on the board so the $10B trajectory isn't forgotten.

2 · The 12 designed

Each card is runnable as-is: hypothesis, primary metric, MDE (minimum detectable effect), duration, required infra, kill-criterion. The four green cards are the resolved validation gates — Babelio's unit economics live or die on them.

E01 · Validation gate

15–20 immersion-learner interviews

Hypothesis: ≥30% of serious immersion learners recall a specific, dated incident in the last 30 days where live or native-client content had no fan-sub and they gave up — proving the wedge pain is real, not desk-invented.

Metric
% recalling a dated <30d painful incident + named tool-spend line items
MDE
30% threshold (the kill line); n=15–20 gives a clear pass/fail read, not a CI
Duration
Week 1 (5 working days)
Infra
Recruiting posts in r/LearnJapanese + Refold/MoeWay/Migaku Discords; the Mom Test 15Q script (audience.md §5) verbatim; call recorder

Kill<30% recall a specific painful live/native-content incident in the last 30 days → the wedge is desk-fiction; stop and re-segment before any build.

E02 · Validation gate · #1 to validate

Van Westendorp + $5/hr concierge WTP test

Hypothesis: enough immersion learners metered-pay above the dub-COGS floor that the $12 plan clears margin — i.e. the validated WTP ceiling ($6–12) is not structurally below the cost floor once usage is measured, not stated. This is the single most important number in the company.

Metric
Actual metered $ paid per buyer vs the per-active-hour COGS floor ($0.31 subtitle / $0.50 dub); dub-minute attach rate
MDE
≥30% of buyers (the kill line) sustain a rate that clears blended COGS within the $6–12 band; n=30–50
Duration
Week 2 (concierge fulfilled by hand)
Infra
4-question Van Westendorp script (monetization.md §2); manual subtitle/mining fulfilment; a simple metered invoice (Stripe/Polar payment link)

Kill<30% metered-pay above the COGS floor for the subtitle hero mode → consumer app is unfundable as priced; pivot to usage-metered or the heaviest sub-segment before locking tiers.

E03 · Validation gate · make-or-break CAC

Loop-factor K test

Hypothesis: the legally-clean sentence-mining card (the user's own gloss + "Mined with Babelio" footer) is a viral artifact: K = (shares per activated user) × (installs per shared artifact) ≥ 0.3. If true, community CAC stays $15–35 and LTV:CAC clears 3:1; if false, the product is single-player and growth is paid-only.

Metric
K = artifact_shared_external per activated user × installs per shared artifact
MDE
Prove: K ≥ 0.3. Demote: K < 0.15. Need ≥30 activated users to read it
Duration
Week 3–4 (needs thin product + instrumentation live)
Infra
Events: artifact_created, artifact_shared_external; deep-link from shared card → install; UTM-tagged "Mined with Babelio" footer

KillK < 0.15 → demote the loop, fall back to paid/community CAC and re-run LTV:CAC on the real number (E12). K<0.15 AND community CAC > $37 = unit economics below 3:1, no inversion path.

E04 · Validation gate · the one channel test

Calendar-timed immersion-challenge channel test

Hypothesis: seeding a trusted-member demo into a recurring immersion-challenge kickoff (Refold/MoeWay/Migaku run these on a calendar; r/LearnJapanese monthly threads) drives installs at a blended CAC ≤ $37 with install-completion ≥ 50% — proving the community channel works at the recurring trigger event.

Metric
Blended community CAC; install-completion rate (download → launch → permission); time-to-activation
MDE
CAC ≤ $37 (kill line) and install-completion ≥ 50%; ≥150 installs to read CAC with confidence
Duration
Week 4, timed to a real challenge kickoff date
Infra
Trusted-member demo arrangement; 4–6 long-tail content pages live; install-completion funnel events (E05); UTM per community

Killinstall-completion < 50% → friction is the bottleneck, fix before any spend. CAC > $37 with K<0.15 → go/no-go = no-go on the consumer wedge as designed.

E05

Install-completion funnel instrumentation

Hypothesis: the trust-heavy native-desktop install (Gatekeeper/SmartScreen + permission grant) is the largest top-of-funnel drop; instrumenting download → launch → permission reveals where, so it can be fixed before channel spend.

Metric
Stage-by-stage completion rate
MDE
Detect any stage with >25% drop; baseline only, no comparison arm
Duration
Continuous from first install
Infra
3 funnel events fired client-side; analytics pipeline

Killinstall-completion < 50% gates all channel scaling (feeds E04).

E06

Reverse-trial vs open-freemium activation A/B

Hypothesis: a 7-day full-Pro reverse trial with card-on-file converts free→paid 2–3× better than open freemium for the engaged learner, without crushing the free base that seeds the loop.

Metric
Free→paid conversion at day 30
MDE
+3pp (6%→9%); needs ~600/arm for power
Duration
4–6 weeks (post-PMF scale)
Infra
Billing with trial states; A/B split; Stripe/Polar card-on-file

Killno lift, or reverse trial starves the free loop base → keep open freemium.

E07

Activation-event definition A/B

Hypothesis: "first exported/shared sentence-mining card" predicts W4 retention better than "first translated minute" — and because it's also the loop trigger, optimizing for it lifts both retention and K at once.

Metric
W4 retention by activation-event cohort
MDE
Detect ≥1.5× W4 retention gap between definitions
Duration
4 weeks min (needs a W4 read)
Infra
Cohort analytics; both events instrumented

Killno retention gap → activation = first translated minute (lower friction); revisit loop assumption.

E08

Day-14 Sean Ellis PMF survey

Hypothesis: ≥40% of the Week-2/3 cohort would be "very disappointed" without Babelio at day 14 — the first defensible PMF reading. Until measured, PMF is explicitly not claimed.

Metric
% "very disappointed" (Sean Ellis)
MDE
40% threshold; ≥30 hands for a credible read
Duration
Fires at each user's day-14 mark
Infra
Timed in-app/email survey; ≥30 onboarded users

Kill<40% "very disappointed" → no PMF claim; iterate hero mode before scaling.

E09

Trusted-mod demo seeding vs cold post

Hypothesis: a demo from a trusted community member/mod clears AV-flag install hesitation better than a cold drop — install-completion is materially higher when social proof, not copy, carries the trust transfer.

Metric
Install-completion rate by seeding type
MDE
+15pp completion; 2 matched communities
Duration
2 weeks
Infra
One mod relationship; UTM per arm; E05 funnel

Killno completion lift → trust transfer isn't the bottleneck; look at the build signing (E19).

E10

Long-tail SEO landing pages

Hypothesis: high-intent native-desktop + live + immersion long-tail ("watch Japanese stream with subtitles desktop", "sentence mining live VTuber stream") ranks where browser-extension competitors can't serve, feeding the long trust-window install at $10–30 CAC.

Metric
Organic visits → install; blended CAC
MDE
CAC ≤ $30; 20–40 pages, read at M3–6
Duration
8–12 weeks (SEO lag)
Infra
CMS; 20–40 pages; demo clips; UTM

Killno ranking on the native-desktop long-tail → SERP too crowded; community-only.

E11

Dub-overage meter nudge

Hypothesis: an in-app meter that suggests the right tier before overage prevents sticker-shock churn on rare dub-heavy users while protecting margin via the $0.06/min meter.

Metric
Churn among dub-heavy users; overage-bill complaints
MDE
−5pp churn in the dub-heavy segment
Duration
4 weeks
Infra
Live usage meter UI; tier-suggest logic

Killno churn reduction → sticker shock isn't the driver; look at the lazy-tool reframe (E18).

E12

Immersion-YouTuber sponsorship CAC test

Hypothesis (contingency): if the loop fails (E03 K<0.15), a polyglot/immersion-YouTuber integration acquires installs at $50–120 — and the team must re-run LTV:CAC on that real number, since at CAC $80 the ratio falls to ~1.4:1, below the 3:1 bar.

Metric
CAC per install via affiliate code; resulting LTV:CAC
MDE
CAC ≤ $37 to clear 3:1; 1–2 sponsorships
Duration
3–4 weeks (only if loop demoted)
Infra
Affiliate codes (E37); creator outreach; landing

KillCAC > $37 with no working loop → consumer unit economics don't clear; escalate the pricing/segment pivot.

3 · Next 4 weeks

One or two experiments concurrent — a solo founder + one engineer cannot run more without losing signal. The sequence mirrors growth.md §7: you cannot test the loop (E03) or the channel (E04) until the thin product and instrumentation exist, so weeks 1–2 are validation-by-hand and weeks 3–4 are product-in-the-loop.

Experiment W1W2W3W4
E01Learner interviews RUN
E02WTP: VW + concierge RUN
E05Install funnel instrumentation BUILDLIVE
E07Activation-event instrumentation BUILDRUN
E03Loop-factor K SEEDREAD
E08Sean Ellis PMF survey DAY-14
E04Immersion-challenge channel test LAUNCH
Why this order — the dependency chain W1 → W2 are gates, not parallel-able: if E01 kills the wedge, nothing downstream runs. E04 is timed to a real challenge kickoff — if the calendar date lands in W3, pull it forward; the trigger event drives the schedule, not the reverse. E03 and E04 share the W4 readout because the loop seeds through the channel launch. E08 fires automatically at each user's day-14 mark, so it overlaps W4 for the Week-2/3 cohort.
Linked artifacts
  • 07 · Growth LoopE03 (loop-factor K) is the experiment the growth-loop model's first lever points to; K≥0.3 underwrites the whole loop.
  • 08 · RetentionE07/E08/E11/E16/E17 are the 5 retention experiments queued from the retention model.
  • 05 · PricingE02 + E23 are the Van Westendorp + concierge WTP gate that pricing is anchored against, not yet validated by.
  • 17 · 90-Day PlanThis 4-week roadmap is the first month of the 90-day validation plan (growth.md §7).
Babelio · Operating Playbook · 09 — Experiment Backlog Refresh weekly · source of truth: research/growth.md §3,§7 + research/audience.md §3 + research/monetization.md §2