What Our llms.txt Is, and Why We Publish It

If you fetch https://pickuma.com/llms.txt, you get a plain-text file: a one-line description of the site, a short About section, then every published article grouped by category with its title, URL, and a single-sentence summary. Fetch https://pickuma.com/llms-full.txt and you get something heavier — the entire article corpus in source form, newest first, with the MDX components stripped out so a model reads prose instead of markup.

These two files exist because language models read the web differently than people do. A person lands on a page, scans the headings, and bounces between links. A crawler feeding a model wants structure it can parse without rendering JavaScript or guessing which <div> holds the article. The llms.txt convention — proposed at llmstxt.org — gives it that structure in a format that’s trivial to fetch and cheap to tokenize.

What the two files actually contain

The split is deliberate. The index file is small — a few hundred lines — and it’s a map, not the territory. It opens with the site summary, links to our editorial standards, affiliate disclosure, privacy page, and tool directory, then lists each article as a bullet: [title](url): description. A model that wants to know what Pickuma covers can read the whole thing in one request and decide what’s worth fetching.

The full corpus is the territory. At roughly 29,000 lines it concatenates every non-draft post, each prefixed with a small frontmatter block:

---
url: https://pickuma.com/for-dev/some-slug/
title: The article title
category: ai-dev-tools
published: 2026-05-30
---

Then the title, the description, and the body — with imports and components removed. A <CtaCard> or <ComparisonTable> in the source becomes nothing in the corpus, because a model doesn’t need our Astro components; it needs the sentences around them.

Because it’s generated from the same content tree, there’s no separate “AI version” of the site to maintain and no risk of the machine-readable copy saying something the human-readable pages don’t. Publish an article, run the build, and it’s in both files automatically.

Why a review site hands its content to models

The obvious objection: if you give your full article text to AI crawlers, won’t models answer questions directly and skip your site? That’s a real tension, and it’s worth being honest about. We publish anyway, for three concrete reasons.

First, models are already reading the open web — the only question is whether they read a clean version or a guessed-at one. Without llms.txt, a crawler still ingests our pages; it just does so by scraping rendered HTML, stripping navigation and ad slots imperfectly, and sometimes attributing the wrong text to the wrong article. The structured file removes the guesswork. If our content is going to inform an answer, we’d rather it be the accurate version with the right URL attached.

Second, attribution travels with the text. Every entry in both files carries the canonical URL. When an assistant cites a source or a user asks “where did this come from,” the link back to Pickuma is right there in the data the model read. Clean source data is the closest thing to a citation guarantee you get in an AI-mediated web.

Third, it matches how we already work. Our editorial standard is that every reviewed tool is tested in real workflows and affiliate disclosures are inline. Publishing the corpus is the same posture applied to machines: here’s everything, here’s how it’s labeled, here’s where it lives. A site that hides its content from crawlers while claiming transparency to readers is telling two different stories.

Cursor

An AI code editor that can ingest documentation and llms.txt-style files as context. If you want to see what reading a site's machine-readable corpus feels like from the consuming side, point it at one.

Free tier; Pro from $20/mo

Try Cursor

Affiliate link · We earn a commission at no cost to you.

The llms.txt standard is young and not yet universally honored — plenty of crawlers ignore it, and there’s no enforcement layer. We treat it the way we treat a sitemap or an RSS feed: a low-cost, well-specified signal that costs us nothing extra to emit because it falls out of the build we already run. If the convention gains traction, we’re already compliant. If it doesn’t, we’ve lost a few kilobytes of generated text.

What this means if you publish content

If you run a blog, docs, or any content site, the practical takeaway is that machine-readability is now a publishing concern, not a future one. You don’t need to rewrite anything. You need a build step that exposes your existing content in a format a model can fetch and parse — and a decision about whether you want that content read cleanly or scraped messily.

We came down on the side of clean. The bet is that accurate, attributed source data serves us better over time than withholding text that crawlers will collect anyway. Whether that bet pays off depends on how the standard evolves and how models handle attribution — neither of which we control. What we control is the quality of what we hand over, and that’s the part worth getting right.

FAQ

Is llms.txt the same as robots.txt?

No. robots.txt tells crawlers what they may and may not access. llms.txt does the opposite job — it offers a clean, structured version of your content specifically for language models to read. One restricts; the other invites.

Does publishing llms.txt help SEO?

Not directly. Traditional search engines use your sitemap and rendered HTML, not llms.txt. Its audience is AI crawlers and assistants. Think of it as a parallel channel for AI discovery, separate from your search ranking.

Won't this let models answer questions without sending traffic to your site?

Possibly, but models already ingest the open web. The realistic choice is whether they read an accurate version with your URL attached or a scraped approximation. We chose accuracy and attribution over withholding text that gets collected regardless.

What Our llms.txt Is, and Why We Publish It

What the two files actually contain

Why a review site hands its content to models

Cursor

What this means if you publish content

FAQ

How Our OG Image Generation Pipeline Works (Satori, resvg, and a Cloudflare Worker)

AI-Assisted Writing Disclosure: Where We Draw the Line

AI Agent Pipelines for Developer Productivity: What Actually Saves Hours

NVIDIA CUTLASS Review: CUDA Templates for GEMM Kernels Behind Modern LLMs

GPT-5.5 Instant vs GPT-5.3 Instant: Testing OpenAI's Three Claims

Get the best tools, weekly