pickuma.
AI & Dev Tools

arXiv Bans Papers With Hallucinated LLM References for One Year

arXiv now imposes a one-year submission ban for papers with unchecked LLM errors like hallucinated citations. Here's the policy, why it exists, and the verification workflow that catches hallucinations before you submit.

6 min read

arXiv changed the rules for how you can use a language model in a research paper. The preprint server now imposes a one-year submission ban when a paper contains incontrovertible evidence of unchecked LLM output — most commonly hallucinated citations or fabricated results that no one verified before the paper went live.

The policy doesn’t ban LLM-assisted writing. It punishes laziness. If you ran a draft through a model, accepted its made-up reference list, and submitted without checking, you’re now blocked from posting any preprint for twelve months. That’s a real cost, especially for grad students and early-career researchers who use arXiv as a timestamp for priority claims.

What the policy actually targets

The trigger isn’t AI use. It’s verifiable error left in the manuscript. Three patterns get papers flagged:

  • Hallucinated citations — references that don’t exist, or that exist but say something different from what the paper claims they say. The most common failure mode for ChatGPT, Claude, and Gemini when asked for sources.
  • Fabricated experimental results — numbers in tables that don’t appear in any code or dataset the authors can produce, figures generated to illustrate a story rather than describe data.
  • Phantom prior work — claims about what a competing paper does or doesn’t show, where the cited paper does no such thing.

You don’t get banned for clean LLM-assisted prose. You get banned when a moderator can open a citation, see it doesn’t exist, and conclude the author never opened it either.

Why hallucinated citations slipped past so many drafts

The reason this is a policy and not just a guideline is volume. Reviewers and moderators have been reporting flagged submissions where the bibliography has the right shape — plausible journal names, real-looking DOIs, author lists that include genuine researchers — but the specific paper doesn’t exist. The format is correct because the model learned what citations look like. The content is wrong because the model has no retrieval guarantee for the specific work cited.

When you paste a related-work section into a chat model and ask it to “add citations,” you get strings that look like references. They are not references. The model is producing a sequence of tokens that match the statistical pattern of a bibliography. Some of those will be real. Some will be combinations of real authors with real-sounding titles attached to real journals — and they won’t exist anywhere.

Three habits make this worse:

  1. Copy-paste from the model’s bibliography into your reference manager without DOI resolution. If your reference manager can’t find the DOI, the paper probably doesn’t exist.
  2. Trusting “I’ll check it later” for citation accuracy. Later is submission day. Submission day is when you ship the hallucination.
  3. Skipping the “open the PDF” step for every cited claim. If you can’t point to the paragraph in the cited paper that supports your claim, you can’t defend the citation in review.

A verification workflow that actually works

The fix isn’t a single tool. It’s a workflow that closes the loop between every claim in your draft and a verifiable source. Here’s what catches hallucinations before submission:

Step 1 — resolve every citation by DOI. Run your reference list through Crossref or a reference manager that resolves DOIs. Any citation that doesn’t resolve is suspect. If you can’t find it on Google Scholar, Semantic Scholar, and Crossref, treat it as hallucinated until proven otherwise.

Step 2 — for every cited claim, link to the supporting passage. Use a research workspace that lets you attach annotated PDFs to each claim. Notion, Obsidian, and Zotero with annotation plugins all work — the point is the discipline, not the tool. If a cited passage doesn’t exist in the source, that’s the citation to delete.

Step 3 — run a separate model pass that questions citations rather than generates them. Feed your bibliography and your claims into a second model and ask: “for each citation, what evidence in the cited paper supports the claim?” If the model can’t answer, the citation is probably wrong, or your claim is overstated.

Step 4 — diff your bibliography against your own pre-LLM search. If you searched for related work yourself before the model helped you write, compare what you found to what’s in the final bibliography. Citations that appeared only after the LLM touched the section get extra scrutiny.

Notion

A workspace where you can attach annotated PDFs to each cited claim, track citation status (verified / pending / hallucinated), and run a final check before submission.

Free for individual researchers; team plans from $10/user/month.

Try Notion

Affiliate link · We earn a commission at no cost to you.

What this means for AI-assisted research writing

The policy shifts the accountability bar in a productive direction. You can still use models to draft, summarize, restructure, and edit. You cannot use them to produce references you haven’t read or numbers you haven’t computed. That distinction is straightforward to honor, and most researchers were already on the right side of it.

The hard cases are subtler: claims about what a cited paper “shows” that drift from what the paper actually argues, paraphrased findings that flip a sign, or summaries of method that omit the constraint that makes the comparison meaningful. Those errors don’t always trigger the ban — they’re invisible to automated checks. But they’re the failures the policy is gesturing at. Treat citation hygiene as a first-class part of your writing workflow, not an end-stage chore.

FAQ

Does using ChatGPT or Claude to draft a paper count as a violation? +
No. arXiv's policy targets unchecked LLM output that produces verifiable errors in the manuscript — hallucinated citations, fabricated results, claims about prior work that the cited paper does not actually make. AI-assisted drafting that you verify and edit is not banned.
What happens if I'm a co-author on a flagged paper? +
The ban applies to the authors of the flagged submission, not just the corresponding author. If your name is on the paper, the one-year suspension covers your other arXiv submissions during that window.
How do I check whether a citation my LLM produced is real? +
Resolve the DOI via doi.org or Crossref. If the DOI doesn't redirect to a real paper, or the paper exists but doesn't say what your draft claims it says, treat the citation as hallucinated and remove it before you submit.

Related reading

See all AI & Dev Tools articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.