How We Use AI Without Letting It Hallucinate Into Reviews
The exact guardrails we put between an LLM and a published review: where AI drafts, where it gets shut off, and how every factual claim gets checked against a primary source.
An LLM will tell you, in confident prose, that a tool has a free tier it does not have, a price that changed eight months ago, and an integration that was never shipped. None of those are typos. They are the model filling a gap in its training data with the most plausible-looking token, and plausible is exactly the problem: a hallucinated spec reads identically to a correct one. If you publish reviews, that failure mode is not a curiosity. It is the thing that gets a reader to sign up for the wrong plan.
We use AI to write here, and we say so on every article that an LLM touched. So the honest question is not whether we use it — it’s what we do to keep it from inventing facts. This is the workflow.
The one rule: AI never sources its own facts
The single decision that prevents most hallucinations is structural, not clever. We separate two jobs that LLMs are wrongly assumed to do together: generating prose and establishing facts. The model is allowed to do the first. It is never allowed to do the second.
Concretely, that means every load-bearing claim in a review — a price, a tier limit, a launch date, whether feature X exists — comes from a source we opened ourselves, not from the model’s memory. The pricing page. The changelog. The docs. The actual product, in a trial account. We paste those facts into a notes document first, with the URL and the date we checked it, and only then does the model get to write around them.
The prompt we hand the model is the inverse of how most people use these tools. Instead of “tell me about Tool X’s pricing,” it’s “here are the four pricing facts, verified today; write the comparison paragraph using only these and flag anything you’d normally add that isn’t here.” That last clause matters. It turns the model’s instinct to embellish into a list of things for a human to go verify, rather than a list of things that quietly ship.
A related discipline: we don’t let the model cite. If a draft comes back with “according to a 2024 study” or “users report,” that phrase gets cut unless we can produce the study or the actual thread. Models generate citations the same way they generate everything else — by pattern — and a confidently formatted fake reference is worse than no reference, because it borrows the authority of a real one.
What the model is actually good for
Saying “we don’t trust it with facts” can read as “we don’t really use it,” which isn’t true. The model does a lot of work; it just does the kind of work where being wrong is visible and cheap to fix.
It restructures. Hand it a messy set of verified notes and it produces a clean section order faster than we would. It catches the second “however” in a paragraph. It rewrites a sentence we’ve stared at too long. It generates the three FAQ questions a reader probably has, which we then answer ourselves from sources. It drafts the comparison-table skeleton so we’re filling cells instead of building markup.
None of those tasks require the model to know a single true fact about the outside world. They’re transformations of text we already verified, or structural suggestions a human signs off on instantly. That’s the sweet spot: the model’s output is checkable at a glance, and a wrong answer costs us ten seconds, not a reader’s trust.
The place we keep the source-of-truth — the verified facts, the dated URLs, the “do not let the model touch this” list — needs to be a real document, not a chat scrollback. We run it in a structured workspace so each claim has a checkbox, a source link, and a last-checked date that an editor can sort by.
Notion
Where our verified-facts sheet lives: one row per claim, each with a source URL and a last-checked date, so an editor can sort by what's gone stale before anything republishes.
Free for personal use; paid plans from $10/user/mo
Affiliate link · We earn a commission at no cost to you.
The check before publish, and the check after
Before a review goes out, it gets a pass whose only job is to find unsourced claims. The reviewer isn’t reading for style; they’re reading every factual sentence and asking “where did this come from?” If the answer isn’t in the notes doc, the sentence doesn’t ship. This is deliberately a separate pass from the editing pass — bundling them is how a smooth, well-written, factually invented paragraph slips through, because good prose lulls you into trusting the content.
The after-publish problem is different and sneakier. A review can be 100% accurate the day it ships and wrong three months later because the tool changed its pricing. No amount of pre-publish discipline catches that. So the dated source links aren’t just for the initial check — they’re a recheck schedule. When a fact’s last-checked date gets old, or when a tool announces a change, we re-open the primary source and update the article, and we log it in the changelog so readers can see what moved and when. An AI-assisted review that’s never revisited drifts into the same wrongness as a hallucinated one; it just takes longer to get there.
That’s the whole system, and it’s intentionally unglamorous. The model writes; humans own the facts; every claim has a dated source; two reads before publish and a recheck after. None of it depends on the model getting better or being prompted more cleverly. It depends on never asking the model to be the thing it can’t reliably be.
FAQ
Do you disclose which articles used AI?
If a human verifies every fact anyway, what does the AI actually save?
How do you handle a price or feature that changed after publishing?
Related reading
2026-06-22
What 18 Months of Affiliate Data Taught Us About Which Reviews Convert
We pulled 18 months of click and conversion data across our tool reviews. The patterns that drove signups were not the ones we expected when we started.
2026-06-22
Why pickuma Runs No Sponsored Posts (and How That Shapes Recommendations)
pickuma takes affiliate commissions but never sells sponsored coverage. Here's the difference between the two models and how it changes what we recommend.
2026-06-22
How We Score Tools: The Rubric Behind Every pickuma Review
A look inside the five-dimension scoring rubric pickuma uses to rate developer and AI tools, how the weights shift by category, and where a single number stops being useful.
2026-06-10
The E-E-A-T Signals We Actually Invest In (and the Ones We Skip)
E-E-A-T is not a meta tag you can set. Here is where an AI-assisted publication spends real effort on trust signals, and where we decided the effort is wasted.
2026-06-10
How We Handle Internal Linking Across Hundreds of Articles Without a Spreadsheet
The internal linking system behind pickuma.com: a typed URL helper, an automated related-posts scorer, and a build step that fails when a link would 404.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.