pickuma.
AI Knowledge Work

Writing User Stories With AI Without Losing the Why

AI fills the 'so that' clause of a user story with plausible reasons that were never in your research. Here is how to ground the prompt and keep an auditable why-trace.

6 min read

You can ask an LLM to “turn these notes into user stories” and get a dozen tidy lines in the As a [role], I want [feature], so that [benefit] format in seconds. They scan clean. They fit the template. And some of them quietly invent a reason the user never had.

The format survives. The why doesn’t. A user story carries exactly one thing into sprint planning: the reason a change matters to a specific person. When a model fills the “so that” clause with a benefit that sounds reasonable instead of the one your research actually surfaced, you end up with a backlog that looks finished and points the team at the wrong outcome.

This is fixable, but not by writing a cleverer one-line prompt. It takes structuring the input and keeping a trace of where each why came from.

Why AI-generated stories drift from the why

LLMs are pattern-completers. The As a / I want / so that shape is one of the most common structures in their training data, so a model can produce a grammatically perfect story with no grounding in your problem. The “I want” half is usually safe, because it restates a feature you described. The “so that” half is where invention creeps in, since a benefit is rarely stated outright in raw notes and the model infers it.

Three failure modes show up over and over:

  • Benefit inflation. “so that I can save time” gets upgraded to “so that I can dramatically improve my workflow efficiency.” Vaguer, grander, untestable.
  • Persona collapse. A new admin, a power user, and a billing manager get flattened into one generic “user” because the notes didn’t separate them and the model didn’t ask.
  • Invented motivation. The dangerous one. The model supplies a plausible reason that no interview ever produced, and it reads exactly like the real ones.

The common thread: the model is filling gaps you didn’t know were gaps. The fix is to stop handing it gaps.

Anchor the prompt to the problem, not the format

The instinct is to teach the model the template. It already knows the template. What it doesn’t have is your evidence. So put the evidence in front of it and forbid it from reaching past it.

A prompt that holds the why has four parts:

  1. The raw source. Paste the interview transcript, support ticket, or sales-call notes verbatim. Don’t summarize first, because summarizing is where the first layer of why gets lost.
  2. An explicit persona list. Name the roles you actually heard from: “Stories are only for these three personas. Do not invent others.”
  3. A grounding rule. “Every ‘so that’ clause must quote or paraphrase a specific line from the source. If no reason is stated for a story, write ‘so that — UNSTATED’ instead of guessing.”
  4. A confidence flag. Ask the model to mark each story grounded (reason in source) or inferred (reason assumed), so you can review the inferred ones by hand.

That UNSTATED instruction is the single highest-leverage line. It converts the model’s tendency to confabulate into a visible to-do: instead of a confident wrong reason, you get a flag that says go ask the user.

Keep a why-trace in your backlog

Grounding the prompt fixes generation. It does nothing for what happens three weeks later, when an engineer reads the story, doesn’t believe the “so that,” and has no way to check it. The reason has to travel with the story.

A why-trace is one extra field on each story: a link or quote back to the source the reason came from. “so that they can reconcile a disputed charge — from ticket #4821.” Now the why is auditable. Anyone can click through and confirm the model didn’t make it up, and when priorities get challenged in planning you argue from evidence instead of vibes.

This is mechanical to maintain if your backlog tool supports a source field and relations. In Notion, a stories database related to a research database gives you the trace for free: each story points at the interview or ticket it came from, and you can filter for every story whose reason is still inferred and needs a human pass before it enters a sprint.

Notion

A stories database related to a research-notes database keeps every 'so that' linked to the source it came from, so AI-drafted reasons stay auditable instead of free-floating.

Free plan; paid plans from $10/user/mo

Try Notion

Affiliate link · We earn a commission at no cost to you.

The workflow that holds together looks like this: capture raw research in one place, run a grounded prompt that flags inferred reasons, link each story back to its source, then review only the flagged ones by hand. The AI drafts, the trace keeps it honest, and you spend your attention on the handful of stories where the why is genuinely uncertain instead of rewriting all of them.

FAQ

FAQ

Should I let AI write the 'so that' clause at all?+
Yes, but only as a draft you verify. The model is good at paraphrasing a stated reason into clean story language. It is unreliable at supplying a reason that is not in the source. Use it for the first job, review it for the second, and flag anything it had to infer.
How is this different from writing stories myself?+
Speed on the mechanical part. Drafting fifteen well-formatted stories from a transcript by hand takes the better part of an hour; a grounded prompt does it in a minute and flags the few that need your judgment. You spend the same care on the hard cases and skip the tedious formatting.
What is the fastest way to catch an invented reason?+
The source-link test. If every 'so that' has a clickable trace back to a transcript line or ticket, invented reasons stand out immediately — they are the ones with no link. No trace, no trust.

Related reading

See all AI Knowledge Work articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.