B Babelio Playbook · 19
How We Operate
Founder comms · Day-1 onboarding

How we operate

The doc every new teammate reads on day one. We are a tiny pre-seed team with one belief at the center: measure before you claim, and tell the truth when the measurement is bad. Read this once and you should be able to make a decision, disagree well, and know where every conversation lives — without a single onboarding call.

Team
Pre-seed
Default
Async
Culture
Eval-driven
Refresh
Quarterly
Purpose & bar Purpose: values + norms a new hire reads on day one. Bar for good: it reduces onboarding-conversation overhead to days, not weeks — a new teammate can act without asking "how do we do X here?" These are operational rules, not aspirations: each value carries an anti-example so it can actually be applied.

Our values

Four values, each with the behavior we reward and the one we don't. A value without an anti-example is decoration — so every one names something we explicitly refuse to do.

Value 01

Measure, then claim

Every quality claim about the product — latency, translation quality, retention, PMF — must be backed by a number from a real run. We don't ship a mode without an eval, and we don't say "users love it" without a Sean Ellis reading.

We do

"Latency p95 is 680ms on the eval set; here's the run." Numbers carry a source.

×We don't

We don't claim PMF, "it feels fast," or "40% retention" until it's measured in real hands.

Value 02

Lead with the lowlight

When something is broken, contradicted, or unproven, we say it first — to each other, to investors, to users. Our own review found the COGS number contradicted 5–13× across docs; the right response was to surface it, not bury it. Bad news travels at the speed of trust.

We do

Open the standup with the thing that scares you most this week. Flag your own contradiction before someone else finds it.

×We don't

We don't round 0 users up to "early traction," and we don't save the bad number for the end of the deck.

Value 03

Serve the learner's job

Our one ICP is the immersion-method language learner. Every feature is judged against their job — comprehensible input — not against what's technically impressive. A subtitle that preserves the original audio beats a slick auto-mute dub if the learner's job needs the original voice.

We do

Ship subtitle/dual-track as the default because that's the learner's actual job; keep auto-mute dub optional.

×We don't

We don't build for the impressive demo (interpreters, full dub) when it fights the ICP's job-to-be-done.

Value 04

Small, fast, reversible

A tiny team wins on speed of learning, not size of plan. Prefer the small reversible bet you can run this week over the big plan you debate for a month. Two-way-door decisions get made by whoever's closest, today.

We do

Run the concierge test on 30 buyers this week to learn WTP, instead of modeling it for a month.

×We don't

We don't hold a meeting to approve a reversible decision someone could just make and undo.

How we decide

DRI One Directly Responsible Individual per decision. Every decision has exactly one owner — the DRI — who gathers input, makes the call, and writes it down. The DRI is not the most senior person; it's the person closest to the work. At a 2–4 person team, "we'll decide together" is how nothing gets decided, so we name the owner out loud.

Two-way vs one-way doors. Reversible decisions (a copy change, an experiment, a prompt tweak) — the DRI just ships and logs it. Irreversible or expensive ones (entity/funding path, the audio-engineer hire, picking a TTS vendor to build on, a pricing-model change) — the DRI writes a one-paragraph memo and gets a same-day async sign-off from the founder before committing.

Escalate Escalate when a decision is one-way, crosses a tripwire in the risk register, would change a number in the financial model, or two people are blocked and can't converge by end of day. Escalation goes to the founder; the founder decides within one business day or explicitly delegates back. Recurring decision-types live in 15-decisions-raci, so we never re-litigate the same call twice.

How we disagree

Disagree and commit. You owe the team your strongest objection before the decision, in writing, with your reasoning. Once the DRI calls it, you commit fully — even if you'd have chosen differently. Re-litigating a decided call in side-channels is the one thing that quietly kills a small team.

When you may block. Blocking is for the rare case where you believe the decision is ethically wrong, legally exposed, or one-way-fatal to the company — a voice-clone consent gap, a claim we can't back with data, betting the model on an unproven self-host assumption. Blocking is loud and rare: name the specific harm and the evidence. "I disagree" is not a block; "this ships a PMF claim we never measured" is.

Decision > consensus. We don't wait for everyone to agree. We wait for everyone to be heard, then the DRI decides and logs it.

Meeting hygiene

Async is the default; a meeting is a confession that async failed. We hold exactly two recurring syncs: a 15-min Monday kickoff and a 30-min Friday demo+retro. Everything else is a written thread unless it clears the bar below.

  • No status meetings. Status is posted in writing before the sync, not read aloud during it. If the meeting is one person updating others, it should have been a message.
  • No agenda, no meeting. A meeting without a written agenda + a named decision to make gets declined, no offense taken.
  • Every meeting ends with a logged decision + DRI. If nothing was decided, the meeting failed; write down why and what we'll do async instead.
  • Default to 25/50 min and to declining. Two hard-focus blocks a day are protected for everyone — no meetings inside them.

Comms norms

One channel per job. The rule: if the next person can't find the decision in six months, it was posted in the wrong place.

Channel Use it for Response norm
Slack Discussion + quick sync. Threads, not channels-as-firehose. Decisions reached here get copied to the doc/ticket. Async; reply within the day. Not for anything urgent off-hours.
Linear Every unit of work: bugs, tasks, experiments. If it isn't a ticket, it isn't being tracked. One owner + status per ticket, always current.
Notion Durable docs: this playbook, decision log, research, specs. The source of truth a decision points to. Comment in-doc; resolve threads when done.
GitHub Code, prompt modules, and eval definitions. Prompt changes are PRs with regression evals — no Slack-pasted prompts. PR review same-day; CI eval gate must be green.
Email External only: investors, partners, hires, users. Internal email is a smell. Reply within one business day.

On-call expectations

Honest about scale: we're 2–4 people, so on-call is shared and lightweight, not a heroic 24/7 pager. One person holds the week (the rotation is just "whoever isn't head-down on the MVP this week"); the founder is the permanent backstop. We make the rotation real now so it doesn't have to be invented during the first real outage.

Severity levels.

  • SEV-1 — pipeline down (audio capture, STT/MT/TTS, or billing broken for users). Ack within 15 min, all-hands until mitigated, post-mortem within 48h. This is the only level that justifies interrupting focus or off-hours.
  • SEV-2 — degraded (latency past p95 budget, one provider failing, eval gate red). Ack same business day; fix or roll back the provider (see 20-ai-addendum swap procedure).
  • SEV-3 — minor (cosmetic, single-user, non-blocking). File a ticket; handle in normal flow.

Every SEV-1/2 gets a written post-mortem (blameless: what failed, what we change), linked from the decision log.

Anti-goals — what we don't do

A strategy is defined as much by its refusals. Five things we will not do, so we never have to debate them under pressure.

×
We don't ship a mode without an eval gate

No translation mode, prompt, or model swap reaches users until it passes the labeled eval set in CI. Quality is the only thing we sell; we don't ship it on a hunch.

×
We don't claim PMF, traction, or numbers we haven't measured

No "users love it," no rounded-up adoption, no LTV:CAC we can't derive from real data. If it isn't measured, it's a hypothesis and we label it one.

×
We don't chase a second ICP before the first one pays

Immersion learners are the one beachhead. Interpreters, creators, and enterprise are explicitly off-roadmap until the learner wedge is paying — a tiny team that serves everyone serves no one.

×
We don't compete on model quality or price

Everyone rents the same STT/MT/TTS APIs — model quality is table stakes, not a moat. We win on OS-level integration, speed, and UX. We don't start a price race to the bottom either.

×
We don't hire generalists or grow headcount ahead of revenue

Pre-PMF, every hire is a specialist the moat needs (first: the Rust/CoreAudio audio engineer). We don't add roles to feel like a company; headcount follows revenue milestones, not the calendar.

See also: 15 · Decisions + RACI for the recurring-decision owners, 13 · Operating cadence for the meeting rhythm in full, and 20 · AI addendum for the eval gate + provider-swap procedure this doc references. Refresh: quarterly
Babelio · Playbook artifact 19 of 24 · Founder comms Measure, then claim · read on day one