What Our llms.txt Is, and Why We Publish It
Pickuma ships two machine-readable files for AI crawlers: an index and a full corpus. Here's what's in them, how they're generated, and why a review site publishes its own content to language models.
If you fetch https://pickuma.com/llms.txt, you get a plain-text file: a one-line description of the site, a short About section, then every published article grouped by category with its title, URL, and a single-sentence summary. Fetch https://pickuma.com/llms-full.txt and you get something heavier — the entire article corpus in source form, newest first, with the MDX components stripped out so a model reads prose instead of markup.
These two files exist because language models read the web differently than people do. A person lands on a page, scans the headings, and bounces between links. A crawler feeding a model wants structure it can parse without rendering JavaScript or guessing which <div> holds the article. The llms.txt convention — proposed at llmstxt.org — gives it that structure in a format that’s trivial to fetch and cheap to tokenize.
What the two files actually contain
The split is deliberate. The index file is small — a few hundred lines — and it’s a map, not the territory. It opens with the site summary, links to our editorial standards, affiliate disclosure, privacy page, and tool directory, then lists each article as a bullet: [title](url): description. A model that wants to know what Pickuma covers can read the whole thing in one request and decide what’s worth fetching.
The full corpus is the territory. At roughly 29,000 lines it concatenates every non-draft post, each prefixed with a small frontmatter block:
---
url: https://pickuma.com/for-dev/some-slug/
title: The article title
category: ai-dev-tools
published: 2026-05-30
---
Then the title, the description, and the body — with imports and components removed. A <CtaCard> or <ComparisonTable> in the source becomes nothing in the corpus, because a model doesn’t need our Astro components; it needs the sentences around them.
Because it’s generated from the same content tree, there’s no separate “AI version” of the site to maintain and no risk of the machine-readable copy saying something the human-readable pages don’t. Publish an article, run the build, and it’s in both files automatically.
Why a review site hands its content to models
The obvious objection: if you give your full article text to AI crawlers, won’t models answer questions directly and skip your site? That’s a real tension, and it’s worth being honest about. We publish anyway, for three concrete reasons.
First, models are already reading the open web — the only question is whether they read a clean version or a guessed-at one. Without llms.txt, a crawler still ingests our pages; it just does so by scraping rendered HTML, stripping navigation and ad slots imperfectly, and sometimes attributing the wrong text to the wrong article. The structured file removes the guesswork. If our content is going to inform an answer, we’d rather it be the accurate version with the right URL attached.
Second, attribution travels with the text. Every entry in both files carries the canonical URL. When an assistant cites a source or a user asks “where did this come from,” the link back to Pickuma is right there in the data the model read. Clean source data is the closest thing to a citation guarantee you get in an AI-mediated web.
Third, it matches how we already work. Our editorial standard is that every reviewed tool is tested in real workflows and affiliate disclosures are inline. Publishing the corpus is the same posture applied to machines: here’s everything, here’s how it’s labeled, here’s where it lives. A site that hides its content from crawlers while claiming transparency to readers is telling two different stories.
Cursor
An AI code editor that can ingest documentation and llms.txt-style files as context. If you want to see what reading a site's machine-readable corpus feels like from the consuming side, point it at one.
Free tier; Pro from $20/mo
Affiliate link · We earn a commission at no cost to you.
The llms.txt standard is young and not yet universally honored — plenty of crawlers ignore it, and there’s no enforcement layer. We treat it the way we treat a sitemap or an RSS feed: a low-cost, well-specified signal that costs us nothing extra to emit because it falls out of the build we already run. If the convention gains traction, we’re already compliant. If it doesn’t, we’ve lost a few kilobytes of generated text.
What this means if you publish content
If you run a blog, docs, or any content site, the practical takeaway is that machine-readability is now a publishing concern, not a future one. You don’t need to rewrite anything. You need a build step that exposes your existing content in a format a model can fetch and parse — and a decision about whether you want that content read cleanly or scraped messily.
We came down on the side of clean. The bet is that accurate, attributed source data serves us better over time than withholding text that crawlers will collect anyway. Whether that bet pays off depends on how the standard evolves and how models handle attribution — neither of which we control. What we control is the quality of what we hand over, and that’s the part worth getting right.
FAQ
Is llms.txt the same as robots.txt?+
Does publishing llms.txt help SEO?+
Won't this let models answer questions without sending traffic to your site?+
Related reading
2026-05-28
AI-Assisted Writing Disclosure: Where We Draw the Line
Most 'AI-assisted' badges are vague. Here's the binary threshold we use for flagging articles, why FTC and E-E-A-T guidance pushed us there, and the edge cases that still leak.
2026-05-26
AI Agent Pipelines for Developer Productivity: What Actually Saves Hours
We tested a four-stage AI agent pipeline for code review, test generation, and deployment over two weeks. Here's where the gains are real and where the failure modes hide.
2026-05-26
NVIDIA CUTLASS Review: CUDA Templates for GEMM Kernels Behind Modern LLMs
NVIDIA CUTLASS provides CUDA C++ templates and Python DSLs for building custom GEMM kernels. We examine where it fits versus cuBLAS, what the abstraction costs you, and when to reach for it.
2026-05-26
GPT-5.5 Instant vs GPT-5.3 Instant: Testing OpenAI's Three Claims
OpenAI silently swapped ChatGPT's default from GPT-5.3 Instant to GPT-5.5 Instant. We break down which of the three official claims — speed, reasoning, accuracy — hold up in independent testing, and what to do if you ship on the API.
2026-05-26
OpenAI Daybreak vs Anthropic Glasswing: When AI Security Tools Converge
OpenAI Daybreak and Anthropic Glasswing launched the same week with near-identical cybersecurity benchmarks and overlapping enterprise partners. Here's what the convergence means for AppSec teams and how to evaluate both.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.