Artifact 09 — Demand Engine

Experiment Backlog

Fifty growth and validation experiments, ICE-scored and ranked. The top twelve are fully designed — hypothesis, metric, MDE, duration, kill-criterion. The next four weeks are sequenced. Babelio is prototype-only: this backlog is a validation engine first, a growth engine second.

Backlog size

Designed

Concurrent

1–2

Roadmap

4wk

Source of truth — read first Use this as the source of truth for growth/PM standups. Refresh weekly: re-score after each readout, promote the next experiment, retire the dead ones. ICE numbers and scores are identical across languages — only titles and hypotheses translate.

Bar for good A PM or growth lead opens this page Monday morning and starts the top experiment without asking the founder a single question — recruitment source, metric, MDE, duration and the number that kills it are all on the card.

Scoring. ICE = Impact × Confidence ÷ Ease, each rated 1–10. Dividing by Ease (instead of multiplying) pushes cheap, fast experiments up — correct for a pre-PMF prototype where learning velocity beats polish. Higher score = do sooner. The four green rows are the resolved must-run-first set.

1 · The 50 · ICE-scored

Sorted by ICE descending. ◆ = fully designed in §2. Green = the resolved first-wave validation set (interviews, WTP, loop-K, channel test) — these run before any growth spend.

I Impact · C Confidence · E Ease · Score = I×C÷E NOW running · NEXT queued · blank = backlog

#	Experiment	Channel / Surface	I	C	E	Score	Owner	Status
E01	15–20 immersion-learner interviews (Mom Test, dated-incident probe)	Discord/Reddit	10	9	3	30.0	Founder	NOW
E02	Van Westendorp + metered $5/hr concierge WTP test (dub vs subtitle ceiling)	Concierge	10	8	3	26.7	Founder	NEXT
E03	Loop-factor K test — sentence-mining card share → install	In-product	9	7	4	15.8	Growth
E04	Calendar-timed immersion-challenge channel test (Refold/MoeWay/Migaku)	Community	9	6	4	13.5	Growth
E05	Install-completion funnel instrumentation (download → launch → permission)	In-product	8	8	4	16.0	Eng
E06	Reverse-trial vs open-freemium activation A/B (7-day card-on-file)	In-product	8	7	4	14.0	Growth
E07	Activation event A/B — "first shared card" vs "first translated minute"	In-product	8	7	4	14.0	Growth
E08	Day-14 Sean Ellis PMF survey on Week-2/3 cohort	Email/In-app	9	7	5	12.6	Founder
E09	Trusted-mod demo seeding vs cold post (install-trust transfer)	Community	8	6	4	12.0	Growth
E10	Long-tail SEO landing pages — "watch [lang] stream with subtitles desktop"	SEO/Content	7	6	4	10.5	Content
E11	Dub-overage meter nudge — suggest right tier before sticker shock	In-product	7	7	5	9.8	Growth
E12	Immersion-YouTuber sponsorship CAC test (loop-fail contingency)	Sponsorship	8	5	5	8.0	Growth
E13	20-sec install Loom on landing + pinned in community threads	Content	6	7	3	14.0	Content
E14	"Mined with Babelio" footer on exported Anki cards (loop attribution)	In-product	7	6	3	14.0	Growth
E15	One-click "subtitle this now" auto-detect on D1 onboarding	In-product	8	7	5	11.2	Eng
E16	Saved language/app presets to cut D7 return friction	In-product	7	7	5	9.8	Eng
E17	D30 immersion-streak digest email ("you mined 42 sentences / 6 hrs")	Email	7	6	4	10.5	Growth
E18	"You understood X% on your own" reframe vs lazy-tool churn driver	In-product	7	6	4	10.5	Growth
E19	Notarized + code-signed build vs unsigned (Gatekeeper drop-off)	In-product	8	8	6	10.7	Eng
E20	"Why these permissions" first-run trust screen A/B	In-product	6	7	4	10.5	Eng
E21	r/LearnJapanese monthly-challenge thread launch post	Community	7	5	3	11.7	Growth
E22	Anki-export deep-link from card share back to install	In-product	7	6	4	10.5	Eng
E23	Heaviest-immersers sub-segment $15–20/mo tolerance probe (WTP Test B)	Concierge	8	6	5	9.6	Founder
E24	Async/batch high-quality mode landing for clipper hedge segment	SEO/Content	6	5	4	7.5	Content
E25	"Never store or upload your audio" trust headline on landing	Landing	6	6	3	12.0	Content
E26	Pricing-page anchor A/B — $12 vs $9 vs $15 Pro	Landing	7	5	4	8.8	Growth
E27	Refold Discord pinned demo by a power-user champion	Community	7	5	4	8.8	Growth
E28	Live-stream subtitle demo clip (own content) for organic shares	Content	6	5	3	10.0	Content
E29	Onboarding latency-perception test (sub-700ms feels live?)	In-product	7	6	5	8.4	Eng
E30	Korean + Chinese language expansion vs Japanese-only acquisition	Community	6	5	5	6.0	Growth
E31	Referral incentive — free dub-minutes for inviting a cohort peer	In-product	7	5	5	7.0	Growth
E32	Windows-installer EV-cert SmartScreen drop-off test	In-product	7	6	7	6.0	Eng
E33	Whisper-dub-under-original vs subtitle-only default preference test	In-product	7	6	5	8.4	Growth
E34	VTuber-clipper Discord async-mode beta drop (secondary segment)	Community	5	5	4	6.3	Growth
E35	Annual-prepay −17% conversion lift vs monthly default	Checkout	6	6	5	7.2	Growth
E36	"Today's immersion" recap auto-generate + one-tap share	In-product	7	5	6	5.8	Eng
E37	Polyglot-YouTuber affiliate-code attribution test	Sponsorship	6	5	5	6.0	Growth
E38	In-app NPS prompt after 5th session for testimonial sourcing	In-product	5	6	4	7.5	Growth
E39	SEO comparison page — "Babelio vs Language Reactor for live streams"	SEO/Content	6	5	4	7.5	Content
E40	Win-back email for D30 churned learners ("new app coverage")	Email	5	5	4	6.3	Growth
E41	Per-app capture/eval telemetry flywheel instrumentation (moat seed)	In-product	8	6	7	6.9	Eng
E42	TikTok/Shorts "watch live anime untranslated" demo (own clips)	Social	6	4	4	6.0	Content
E43	Onboarding checklist gamification (3 cards in 7 days → aha)	In-product	6	5	5	6.0	Growth
E44	Migaku/Anki add-on cross-promo partnership probe	Partnership	7	4	6	4.7	Founder
E45	Pricing meter UI — show live cost-per-session to build trust	In-product	5	5	5	5.0	Eng
E46	Cohort-streak leaderboard inside one immersion Discord	Community	5	4	5	4.0	Growth
E47	Education-org / language-school PQL discovery (B2B trajectory seed)	Outbound	7	3	6	3.5	Founder
E48	SDK/API waitlist landing for accessibility-compliance buyers	Landing	6	3	4	4.5	Founder
E49	Voice-clone same-speaker dub interest probe (lowest ODI — defer)	Survey	3	4	5	2.4	Growth
E50	Paid-social impulse ad test (expected to fail — install friction)	Paid social	4	3	5	2.4	Growth

Read the bottom of the list too E49 (voice-clone) and E50 (paid social) sit last on purpose — voice-clone is the lowest-ODI outcome for this ICP, and paid social fights the trust-heavy install decision. They are documented so the team doesn't re-pitch them as "new ideas" every quarter. E47–E48 are deliberately low-confidence B2B/SDK seeds: not for now, but on the board so the $10B trajectory isn't forgotten.

2 · The 12 designed

Each card is runnable as-is: hypothesis, primary metric, MDE (minimum detectable effect), duration, required infra, kill-criterion. The four green cards are the resolved validation gates — Babelio's unit economics live or die on them.

E01 · Validation gate

15–20 immersion-learner interviews

Hypothesis: ≥30% of serious immersion learners recall a specific, dated incident in the last 30 days where live or native-client content had no fan-sub and they gave up — proving the wedge pain is real, not desk-invented.

Metric: % recalling a dated <30d painful incident + named tool-spend line items
MDE: 30% threshold (the kill line); n=15–20 gives a clear pass/fail read, not a CI
Duration: Week 1 (5 working days)
Infra: Recruiting posts in r/LearnJapanese + Refold/MoeWay/Migaku Discords; the Mom Test 15Q script (audience.md §5) verbatim; call recorder

Kill — <30% recall a specific painful live/native-content incident in the last 30 days → the wedge is desk-fiction; stop and re-segment before any build.

E02 · Validation gate · #1 to validate

Van Westendorp + $5/hr concierge WTP test

Hypothesis: enough immersion learners metered-pay above the dub-COGS floor that the $12 plan clears margin — i.e. the validated WTP ceiling ($6–12) is not structurally below the cost floor once usage is measured, not stated. This is the single most important number in the company.

Metric: Actual metered $ paid per buyer vs the per-active-hour COGS floor ($0.31 subtitle / $0.50 dub); dub-minute attach rate
MDE: ≥30% of buyers (the kill line) sustain a rate that clears blended COGS within the $6–12 band; n=30–50
Duration: Week 2 (concierge fulfilled by hand)
Infra: 4-question Van Westendorp script (monetization.md §2); manual subtitle/mining fulfilment; a simple metered invoice (Stripe/Polar payment link)

Kill — <30% metered-pay above the COGS floor for the subtitle hero mode → consumer app is unfundable as priced; pivot to usage-metered or the heaviest sub-segment before locking tiers.

E03 · Validation gate · make-or-break CAC

Loop-factor K test

Hypothesis: the legally-clean sentence-mining card (the user's own gloss + "Mined with Babelio" footer) is a viral artifact: K = (shares per activated user) × (installs per shared artifact) ≥ 0.3. If true, community CAC stays $15–35 and LTV:CAC clears 3:1; if false, the product is single-player and growth is paid-only.

Metric: K = artifact_shared_external per activated user × installs per shared artifact
MDE: Prove: K ≥ 0.3. Demote: K < 0.15. Need ≥30 activated users to read it
Duration: Week 3–4 (needs thin product + instrumentation live)
Infra: Events: artifact_created, artifact_shared_external; deep-link from shared card → install; UTM-tagged "Mined with Babelio" footer

Kill — K < 0.15 → demote the loop, fall back to paid/community CAC and re-run LTV:CAC on the real number (E12). K<0.15 AND community CAC > $37 = unit economics below 3:1, no inversion path.

E04 · Validation gate · the one channel test

Calendar-timed immersion-challenge channel test

Hypothesis: seeding a trusted-member demo into a recurring immersion-challenge kickoff (Refold/MoeWay/Migaku run these on a calendar; r/LearnJapanese monthly threads) drives installs at a blended CAC ≤ $37 with install-completion ≥ 50% — proving the community channel works at the recurring trigger event.

Metric: Blended community CAC; install-completion rate (download → launch → permission); time-to-activation
MDE: CAC ≤ $37 (kill line) and install-completion ≥ 50%; ≥150 installs to read CAC with confidence
Duration: Week 4, timed to a real challenge kickoff date
Infra: Trusted-member demo arrangement; 4–6 long-tail content pages live; install-completion funnel events (E05); UTM per community

Kill — install-completion < 50% → friction is the bottleneck, fix before any spend. CAC > $37 with K<0.15 → go/no-go = no-go on the consumer wedge as designed.

E05

Install-completion funnel instrumentation

Hypothesis: the trust-heavy native-desktop install (Gatekeeper/SmartScreen + permission grant) is the largest top-of-funnel drop; instrumenting download → launch → permission reveals where, so it can be fixed before channel spend.

Metric: Stage-by-stage completion rate
MDE: Detect any stage with >25% drop; baseline only, no comparison arm
Duration: Continuous from first install
Infra: 3 funnel events fired client-side; analytics pipeline

Kill — install-completion < 50% gates all channel scaling (feeds E04).

E06

Reverse-trial vs open-freemium activation A/B

Hypothesis: a 7-day full-Pro reverse trial with card-on-file converts free→paid 2–3× better than open freemium for the engaged learner, without crushing the free base that seeds the loop.

Metric: Free→paid conversion at day 30
MDE: +3pp (6%→9%); needs ~600/arm for power
Duration: 4–6 weeks (post-PMF scale)
Infra: Billing with trial states; A/B split; Stripe/Polar card-on-file

Kill — no lift, or reverse trial starves the free loop base → keep open freemium.

E07

Activation-event definition A/B

Hypothesis: "first exported/shared sentence-mining card" predicts W4 retention better than "first translated minute" — and because it's also the loop trigger, optimizing for it lifts both retention and K at once.

Metric: W4 retention by activation-event cohort
MDE: Detect ≥1.5× W4 retention gap between definitions
Duration: 4 weeks min (needs a W4 read)
Infra: Cohort analytics; both events instrumented

Kill — no retention gap → activation = first translated minute (lower friction); revisit loop assumption.

E08

Day-14 Sean Ellis PMF survey

Hypothesis: ≥40% of the Week-2/3 cohort would be "very disappointed" without Babelio at day 14 — the first defensible PMF reading. Until measured, PMF is explicitly not claimed.

Metric: % "very disappointed" (Sean Ellis)
MDE: 40% threshold; ≥30 hands for a credible read
Duration: Fires at each user's day-14 mark
Infra: Timed in-app/email survey; ≥30 onboarded users

Kill — <40% "very disappointed" → no PMF claim; iterate hero mode before scaling.

E09

Trusted-mod demo seeding vs cold post

Hypothesis: a demo from a trusted community member/mod clears AV-flag install hesitation better than a cold drop — install-completion is materially higher when social proof, not copy, carries the trust transfer.

Metric: Install-completion rate by seeding type
MDE: +15pp completion; 2 matched communities
Duration: 2 weeks
Infra: One mod relationship; UTM per arm; E05 funnel

Kill — no completion lift → trust transfer isn't the bottleneck; look at the build signing (E19).

E10

Long-tail SEO landing pages

Hypothesis: high-intent native-desktop + live + immersion long-tail ("watch Japanese stream with subtitles desktop", "sentence mining live VTuber stream") ranks where browser-extension competitors can't serve, feeding the long trust-window install at $10–30 CAC.

Metric: Organic visits → install; blended CAC
MDE: CAC ≤ $30; 20–40 pages, read at M3–6
Duration: 8–12 weeks (SEO lag)
Infra: CMS; 20–40 pages; demo clips; UTM

Kill — no ranking on the native-desktop long-tail → SERP too crowded; community-only.

E11

Dub-overage meter nudge

Hypothesis: an in-app meter that suggests the right tier before overage prevents sticker-shock churn on rare dub-heavy users while protecting margin via the $0.06/min meter.

Metric: Churn among dub-heavy users; overage-bill complaints
MDE: −5pp churn in the dub-heavy segment
Duration: 4 weeks
Infra: Live usage meter UI; tier-suggest logic

Kill — no churn reduction → sticker shock isn't the driver; look at the lazy-tool reframe (E18).

E12

Immersion-YouTuber sponsorship CAC test

Hypothesis (contingency): if the loop fails (E03 K<0.15), a polyglot/immersion-YouTuber integration acquires installs at $50–120 — and the team must re-run LTV:CAC on that real number, since at CAC $80 the ratio falls to ~1.4:1, below the 3:1 bar.

Metric: CAC per install via affiliate code; resulting LTV:CAC
MDE: CAC ≤ $37 to clear 3:1; 1–2 sponsorships
Duration: 3–4 weeks (only if loop demoted)
Infra: Affiliate codes (E37); creator outreach; landing

Kill — CAC > $37 with no working loop → consumer unit economics don't clear; escalate the pricing/segment pivot.

3 · Next 4 weeks

One or two experiments concurrent — a solo founder + one engineer cannot run more without losing signal. The sequence mirrors growth.md §7: you cannot test the loop (E03) or the channel (E04) until the thin product and instrumentation exist, so weeks 1–2 are validation-by-hand and weeks 3–4 are product-in-the-loop.

Experiment	W1	W2	W3	W4
E01Learner interviews	RUN
E02WTP: VW + concierge		RUN
E05Install funnel instrumentation			BUILD	LIVE
E07Activation-event instrumentation			BUILD	RUN
E03Loop-factor K			SEED	READ
E08Sean Ellis PMF survey				DAY-14
E04Immersion-challenge channel test				LAUNCH

Why this order — the dependency chain W1 → W2 are gates, not parallel-able: if E01 kills the wedge, nothing downstream runs. E04 is timed to a real challenge kickoff — if the calendar date lands in W3, pull it forward; the trigger event drives the schedule, not the reverse. E03 and E04 share the W4 readout because the loop seeds through the channel launch. E08 fires automatically at each user's day-14 mark, so it overlaps W4 for the Week-2/3 cohort.

Linked artifacts

07 · Growth LoopE03 (loop-factor K) is the experiment the growth-loop model's first lever points to; K≥0.3 underwrites the whole loop.
08 · RetentionE07/E08/E11/E16/E17 are the 5 retention experiments queued from the retention model.
05 · PricingE02 + E23 are the Van Westendorp + concierge WTP gate that pricing is anchored against, not yet validated by.
17 · 90-Day PlanThis 4-week roadmap is the first month of the 90-day validation plan (growth.md §7).

Experiment BacklogБэклог экспериментов

1 · The 50 · ICE-scored1 · Все 50 · оценка ICE

2 · The 12 designed2 · 12 спроектированных

15–20 immersion-learner interviews15–20 интервью с immersion-учениками

Van Westendorp + $5/hr concierge WTP testВан Вестендорп + concierge-тест WTP по $5/час

Loop-factor K testТест K-фактора петли

Calendar-timed immersion-challenge channel testТест канала под календарный immersion-челлендж

Install-completion funnel instrumentationИнструментирование воронки завершения установки

Reverse-trial vs open-freemium activation A/BA/B активации: reverse-trial против открытого freemium

Activation-event definition A/BA/B определения события активации

Day-14 Sean Ellis PMF surveyОпрос PMF Sean Ellis на 14-й день

Trusted-mod demo seeding vs cold postСидинг через демо модератора против холодного поста

Long-tail SEO landing pagesLong-tail SEO-лендинги

Dub-overage meter nudgeПодсказка счётчика перерасхода дубляжа

Immersion-YouTuber sponsorship CAC testТест CAC через спонсорство immersion-ютубера

3 · Next 4 weeks3 · Ближайшие 4 недели