pickuma.
AI Knowledge Work

Drafting OKRs With AI Without Writing Meaningless Goals

How to use an LLM to draft OKRs that survive scrutiny: forcing measurable key results, killing activity-disguised-as-outcome, and the prompts that catch vague goals.

7 min read

Ask a model to “write OKRs for my team” and you get back something that reads like a strategy deck and means nothing. “Objective: Become the market leader in developer tooling. Key result: Significantly improve user satisfaction.” Every word is grammatical. None of it is measurable. You cannot tell on December 31st whether you hit it.

The problem is not the model. The problem is that drafting OKRs is mostly an exercise in resisting the easy phrasing your brain reaches for, and a model with no stake in the outcome reaches for that phrasing faster than you do. Used carelessly, an LLM is a vagueness amplifier. Used as an adversarial editor, it is genuinely useful — but only if you structure the work so the model is forced to commit to numbers and dates.

We drafted three quarters of OKRs through Claude and GPT-4-class models to find where they help and where they quietly make things worse. Here is what actually works.

Make the model commit to a number or reject the line

The single highest-leverage move is to forbid unmeasurable key results at the prompt level. A key result that cannot be expressed as a metric with a starting value, a target value, and a deadline is not a key result — it is an aspiration. Most AI-drafted OKRs fail this test on the first pass.

Give the model the rule and make it self-check. A prompt that works:

“Draft 3 key results for this objective. Each key result MUST contain: a metric, a baseline (current value), a target value, and a date. If you cannot supply a real baseline, write [BASELINE UNKNOWN] instead of inventing one. Reject any key result that describes an activity (‘launch X’, ‘ship Y’) rather than a measurable outcome.”

The [BASELINE UNKNOWN] instruction matters more than it looks. Left to its own devices, a model will fabricate a plausible-sounding baseline — “improve activation from 22% to 35%” — when it has no idea what your current activation rate is. That fabricated 22% then anchors the whole quarter. Forcing the model to flag missing data turns a confident hallucination into a visible to-do.

Catch the activity-disguised-as-outcome trap

The most common failure in real OKRs — human or AI — is the key result that measures effort instead of impact. “Ship the new onboarding flow” feels like a result. It is a task. You can ship it and have onboarding get worse.

Models are good at spotting this if you ask them to specifically, and bad at avoiding it on their own. So run a second pass whose only job is the outcome/output distinction:

“For each key result below, classify it as OUTCOME (measures a change in user or business behavior) or OUTPUT (measures work completed). Rewrite every OUTPUT as an OUTCOME, or explain why no outcome metric exists yet.”

This second pass routinely flips half a draft. “Ship onboarding flow” becomes “raise day-7 retention for new signups from baseline X to target Y.” The shipping is now implied — you obviously have to build it — but the goal is the user behavior, not the commit.

The model is useful here precisely because it has no ego about the work. A human author wrote “ship the flow” because shipping the flow is the thing they control and the thing they will be busy doing. The model does not care, so it will happily reclassify it.

Use the model to generate the objections, not just the goals

Once you have a measurable draft, the highest-value use of the model is pre-mortem, not authoring. Feed it the finished OKR set and ask it to attack:

“You are a skeptical VP reviewing these OKRs. For each key result, name one way the team could technically hit the number while making the product worse, and one reason the target might be sandbagged or unrealistic.”

This surfaces the gaming risk that every metric carries. Target “reduce support tickets by 30%” and the model will point out you can hit that by making it harder to find the support button. That is exactly the conversation you want to have before the quarter, not in the retro.

This is where a structured workspace earns its keep. If your objectives, baselines, and weekly check-ins live in scattered docs and Slack threads, every AI session starts from zero and you re-explain context you already wrote down. Keeping them in one queryable place — a database with the objective, metric, baseline, target, owner, and confidence per row — means you can paste the whole picture into a prompt and get grounded edits instead of generic ones.

Notion

A database-backed workspace works well as the single source of truth for OKRs: one row per key result with baseline, target, owner, and weekly confidence, plus an AI assistant that can read the table back when you draft and review.

Free for personal use; paid plans from $10/user/month

Try Notion

Affiliate link · We earn a commission at no cost to you.

What the model still cannot do for you

A model can enforce structure, kill vague phrasing, and stress-test targets. It cannot tell you whether the objective is the right one. “Should we be growing activation or revenue this quarter” is a judgment call about strategy and resourcing that depends on context the model does not have and should not fake. If you ask it to pick your objectives, it will produce four confident, plausible, generic ones — and confident generic strategy is worse than no strategy, because it looks done.

Use the model downstream of the decision. Decide what matters with your team. Then use AI to force that decision into measurable, un-gameable, baseline-grounded key results — and to argue with you about whether the numbers are honest.

FAQ

Will AI-drafted OKRs sound generic?+
They will if you accept the first pass. The first draft is almost always padded with unmeasurable aspirations. The value comes from the second and third passes — forcing baselines, reclassifying outputs as outcomes, and running an adversarial review. The final version reads specific because you stripped the generic out, not because the model wrote it that way.
Should I give the model my real metrics?+
Yes, and it is the single biggest quality lever. Without your actual baselines the model invents numbers that anchor the whole quarter to a fiction. Paste in current values for every metric, or mark them unknown so they become explicit research tasks rather than hidden guesses.
Can a model decide which objectives my team should pursue?+
No. It can format and pressure-test objectives, but choosing them is a strategy decision that depends on resourcing, market timing, and context the model lacks. Ask it to pick and you get plausible-sounding generic goals. Decide the objectives yourself, then use AI to turn them into measurable key results.

Related reading

See all AI Knowledge Work articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.