llms.txt for AI Search: What It Is and How It Works

By Judy Zhou, Founder

Key Takeaways

llms.txt is a proposed standard placing a plain-text, Markdown-structured content summary at your root domain so AI models can understand your site without a full crawl.
SE Ranking's analysis of roughly 300,000 domains found no clear measurable relationship between llms.txt presence and citation frequency in LLM-generated answers — only 10.13% of sites have implemented it.
Anthropic, Google (via its A2A protocol), and Cursor are actively using llms.txt, but the value is primarily in AI-first content discovery and developer tooling, not a direct citation ranking boost.
Implement llms.txt as low-cost infrastructure — curate your top 20-30 pages, add descriptive annotations, and update it quarterly — but invest the bulk of your effort in the content quality signals that influence AI models regardless of whether they read the file.

llms.txt is the file that decides whether AI cites you or ignores you.

That claim deserves scrutiny, because the evidence is messier than most practitioners admit. llms.txt is a proposed web standard that lets site owners provide a structured, plain-text summary of their content specifically for large language models. Think of it as a sitemap for AI: instead of telling Google's crawler which URLs exist, it tells ChatGPT, Perplexity, Claude, and other AI engines what your site is about, which pages matter most, and what context to use when generating answers. A Wix AI Search Lab review of over 1,400 llms.txt files found demonstrable evidence that the file can surface on multiple AI and search platforms. But a SE Ranking analysis of roughly 300,000 domains found no clear measurable relationship between llms.txt presence and citation frequency in LLM-generated answers. Only 10.13% of sites have implemented llms.txt at all. That gap between adoption and proven impact is exactly where the real strategic question lives.

The Anatomy of an llms.txt File

The format is deliberately simple. You create a plain-text file at the root of your domain (yourdomain.com/llms.txt), and it follows a Markdown-adjacent structure proposed by Jeremy Howard in late 2024. A minimal llms.txt file includes a site name, a short description of what the site covers, and a list of URLs with brief annotations about what each page contains.

Here is what a basic llms.txt looks like in practice:

<h1>Meev</h1>
<blockquote style="border-left:4px solid #333;padding:12px 20px;margin:20px 0;background:#f8f9fa;border-radius:4px;font-style:italic;">AI search visibility tracking and quality-gated content publishing for brands.</blockquote>
<h2>Docs</h2>
- What is AEO: Explainer on Answer Engine Optimization and how it differs from SEO.
- AEO vs SEO: Side-by-side comparison of the two approaches.
<h2>Blog</h2>
- AI Visibility Tools: How to evaluate AI search visibility platforms.

The spec also supports an extended variant (llms-full.txt) that includes the full content of key pages rather than just links, giving AI models everything they need without requiring a crawl. Anthropic, Cursor, and Windsurf are among the companies that have reportedly requested llms.txt implementation from their documentation partners, citing token and time savings when ingesting structured content. Google has gone further, incorporating llms.txt into its Agent-to-Agent (A2A) protocol. That is a meaningful signal, even if it is not a guarantee of citation lift.

The structure of a valid llms.txt file explained

How Does llms.txt Differ from robots.txt?

Robots.txt tells crawlers what NOT to access. llms.txt tells AI models what IS worth reading. They operate on opposite logic, and conflating them leads to bad implementation decisions.

Robots.txt is a hard technical gate enforced at the HTTP level. If you block Googlebot, it cannot crawl your page. llms.txt is a soft signal. No LLM is technically obligated to read or obey it. The file lives in the same neighborhood as robots.txt and sitemaps, but its authority is entirely voluntary. A model that has already ingested your content during training will not un-learn it because your llms.txt says to ignore it.

This distinction matters for expectations. Robots.txt gives you control. llms.txt gives you influence, maybe. If you go in expecting the former and get the latter, you will either over-invest or dismiss the file entirely when you see no immediate citation lift. The right frame is: llms.txt is infrastructure for AI-first communication, not a ranking lever.

Why the Adoption Data Should Give You Pause

The SE Ranking 300,000-domain study is the most sobering data point in this conversation. Ten percent adoption, no measurable citation lift. That is not a ringing endorsement.

But I think the study is being misread by the "waste of time" camp. The absence of a measurable citation lift does not mean llms.txt has no value. It means the value is not primarily in citation frequency, which is exactly what Contentful concluded in their own assessment: "any observed gains are difficult to attribute to the file alone due to non-deterministic AI outputs and confounding variables." The AI output pipeline has too many layers between your llms.txt and a final citation for a clean attribution to exist. That is a measurement problem, not a value problem.

Mintlify's framing is more useful here. They cite Vercel reporting that 10% of signups were attributed to ChatGPT, and position llms.txt as part of the infrastructure that enables AI-driven discovery. The file is not a citation trigger. It is a signal that helps AI models understand your content faster and more accurately when they do encounter it. Whether that translates to citations depends on factors llms.txt cannot control: your domain authority, your content depth, your topical coverage, and whether the AI model's retrieval layer is even functioning correctly.

On that last point: a Tow Center study from March 2025 tested 1,600 queries across AI search engines and found that source material was either not retrieved or was inaccurately cited in over 60% of cases. I have had to deliver that finding to clients who spent real budget on outreach campaigns targeting AI overviews. The infrastructure that llms.txt is trying to improve is itself unreliable. That context belongs in any honest llms.txt conversation.

Want to see how your site currently appears across AI search engines before you optimize your llms.txt?

Check Your AI Visibility

Who Actually Reads llms.txt?

Three distinct audiences interact with llms.txt, and most practitioners only think about one of them.

The first is LLMs during inference. When a model like Perplexity or ChatGPT with browsing enabled retrieves content to answer a query, it may fetch your llms.txt to understand your site's structure before deciding which pages to read. OpenAI, Google, Amazon, Microsoft, and Anthropic have all been cited as actively using llms.txt for AI-first engagement, according to Wix's Crystal Carter. The operative word is "may" — this is not guaranteed behavior, and it varies by model and query type.

The second audience is developers and AI agents. This is where the practical value is clearest right now. If someone is building a tool that ingests your documentation, your API reference, or your knowledge base, a well-structured llms.txt dramatically reduces the time and tokens needed to understand your site. This is why Cursor and Windsurf are requesting it from their partners. It is less about search citations and more about machine-readable content architecture.

The third audience is human practitioners: content strategists, SEO specialists, and agency teams auditing a site's AI readiness. A missing or poorly structured llms.txt is a signal that a site has not thought through its AI-first content strategy. In my work overseeing AI-driven content research and publishing for brands at Meev, I treat llms.txt as one of several infrastructure checks, alongside schema markup, internal linking quality, and topical coverage depth. It is not the most important signal, but its absence is a gap worth closing.

Three audiences that interact with your llms.txt file

How to Create a llms.txt File That Actually Works

The spec is simple. The strategy behind it is not.

Start with the basics: create a plain-text file, place it at your root domain, follow the Markdown structure (H1 for site name, blockquote for description, H2 sections for content categories, bullet-point links with brief annotations). You can validate your implementation using a tool like the LLMs.txt Validator before pushing it live.

Beyond the syntax, here is where most implementations go wrong:

They list everything instead of curating. The point of llms.txt is not to replicate your sitemap. It is to surface your best, most authoritative content. If you have 800 blog posts, do not link to all 800. Identify the 20-30 pages that best represent your expertise, your most-cited content, and your highest-converting pages. AI models should be able to read your llms.txt and immediately understand what you are the authority on.

They ignore the annotation layer. A bare URL list is marginally better than nothing. But the annotations. The brief descriptions after each link. Are where you communicate context. "Best practices for X" or "Comparison of Y vs Z with data from 2026" tells a model what kind of content it is about to encounter and whether it is relevant to the query being answered.

They set it and forget it. llms.txt should be updated when your content strategy changes. If you publish a major research piece, add it. If a page becomes outdated, remove it. Treating it as a one-time setup task means it will gradually misrepresent your site.

For sites with technical documentation, consider the llms-full.txt variant. Including full page content in the extended file removes the retrieval step entirely for AI agents, which is the efficiency gain that companies like Anthropic are actually requesting.

For B2B SaaS sites, prioritize your comparison pages, use-case pages, and any content that answers "X vs Y" or "best tool for Z" queries. The Princeton GEO study on generative engine optimization found that navigational and comparative content tends to perform better in AI-generated answers. Your llms.txt should reflect that by surfacing those pages prominently.

For e-commerce and SMB sites, the value proposition is narrower. Google's John Mueller has reportedly stated that llms.txt is not necessary for website discovery and crawling by AI systems. For a local service business or a small online store, the time investment in a well-structured llms.txt may not return measurable value. Prioritize your AEO fundamentals first: structured data, clear topical focus, and content that directly answers the questions your customers are asking.

The Honest Measurement Problem

Here is the contrarian take I keep coming back to: nobody has a clean case study showing that llms.txt implementation caused a measurable increase in AI citations. Not one. I have looked.

The Wix AI Search Lab reviewed over 1,400 llms.txt files and found evidence of business value, but provided no outcome metrics tied to specific implementations. The SE Ranking study found no citation lift across 300,000 domains. Contentful explicitly notes that non-deterministic AI outputs make attribution impossible. The practitioner claiming on LinkedIn that llms.txt is "a waste of time" and challenging anyone to show proof of results is being uncharitable, but he is also not wrong that the proof does not exist in a form that would satisfy a skeptic.

This does not mean you should not implement llms.txt. It means you should implement it with the right expectations. The value is in infrastructure, not in a measurable citation bump you can put in a quarterly report. If you are tracking AI search visibility, tools like Meev's AI visibility tracking can show you where your brand appears across major AI search surfaces, but they cannot isolate llms.txt as the causal variable. That attribution gap is a feature of the current AI search ecosystem, not a flaw in the tool.

What I tell clients now: implement llms.txt because it is low-effort, low-risk, and aligns your site with where AI content consumption is heading. Do not implement it expecting a citation spike. Do not skip it because you cannot prove ROI. Treat it like meta descriptions in 2010. Directionally correct, not yet proven, and cheap enough that the downside of skipping it outweighs the effort of doing it.

Eight-point checklist for a production-ready llms.txt

llms.txt in a Broader AI Visibility Strategy

The mistake I see most often is treating llms.txt as a standalone tactic rather than one component of a broader generative engine optimization strategy.

llms.txt works best when the content it points to is already doing the right things: demonstrating expertise, citing authoritative sources, covering topics with genuine depth, and maintaining a consistent entity signal across pages. A well-structured llms.txt pointing to thin, generic content will not move the needle. A well-structured llms.txt pointing to genuinely authoritative content gives AI models a faster path to the material that deserves to be cited.

This is why the AEO vs SEO framing matters here. Traditional SEO optimizes for crawler discovery and keyword relevance. AEO optimizes for answer extraction and citation selection. llms.txt sits at the intersection: it improves discovery (like a sitemap) while also communicating authority signals (like schema markup). Neither function replaces the other, and neither replaces the underlying content quality that determines whether an AI model finds your answer worth citing.

For teams tracking AI search visibility across surfaces like ChatGPT, Perplexity, and Google AI Overviews, the practical workflow is: audit your current citation presence, identify the content gaps where competitors are being cited and you are not, build or improve that content, and then ensure your llms.txt surfaces it prominently. The Perplexity brand visibility checker is a useful starting point for understanding where you currently stand on one of the most citation-active AI search surfaces.

The deeper structural issue is that AI citation outreach and llms.txt optimization are both operating on top of a retrieval layer that is, by the Tow Center's measurement, failing more than half the time. That is not a reason to abandon either tactic. It is a reason to hold your expectations proportionate to the infrastructure's current reliability, and to invest the bulk of your effort in the content quality and topical authority signals that influence AI models regardless of whether they ever read your llms.txt.

Frequently Asked Questions

Does Google use llms.txt for AI Overviews?

Google has incorporated llms.txt into its A2A (Agent-to-Agent) protocol, suggesting some level of active use. However, Google's John Mueller has also stated that llms.txt is not necessary for website discovery and crawling by AI systems. The practical implication: implementing it aligns with where Google's AI infrastructure is heading, but you should not expect a direct Google AI Overviews ranking boost from the file alone.

How often should I update my llms.txt file?

Treat it like a living document tied to your content strategy. A quarterly review is a reasonable minimum. Update it when you publish major new content, retire outdated pages, or shift your topical focus. A stale llms.txt that points to outdated or removed pages actively misrepresents your site to AI models.

Does llms.txt replace schema markup or sitemaps?

No. These tools serve different functions. Sitemaps tell crawlers which URLs exist. Schema markup adds structured semantic context to individual pages. llms.txt provides a curated, annotated overview of your site's most important content specifically for LLMs. All three can coexist and reinforce each other in a well-structured AI visibility strategy.

Should every website implement llms.txt?

Not necessarily. For documentation-heavy sites, developer tools, B2B SaaS, and content publishers, the investment is clearly worthwhile. For small e-commerce sites, local businesses, or sites with fewer than 50 pages, the marginal value is lower. Prioritize proven AEO practices first: structured data, expert-led content, strong internal linking, and clear topical focus.

Can llms.txt hurt my AI search visibility?

A poorly implemented llms.txt is unlikely to actively harm your visibility, since AI models are not required to follow it. The main risk is pointing to low-quality or outdated content and inadvertently signaling that those pages represent your best work. Curate carefully, and validate your file before publishing using a dedicated llms.txt validator.

How does llms.txt interact with AI citation outreach campaigns?

Think of them as complementary. Outreach earns placements on authoritative third-party domains. llms.txt helps AI models understand your own site's authority and content structure. Neither guarantees citations. The Tow Center's 60% citation failure rate applies regardless. But together they reduce the friction between your content and an AI model's ability to find and use it accurately.

About the Author

Judy Zhou, Founder

Judy Zhou leads content strategy at Meev, where she oversees AI-driven content research and publishing for hundreds of brands. With a background in SEO and editorial operations, she focuses on building content systems that rank on Google, get cited by AI search engines, and drive measurable business results.

Track your brand's AI search visibility across every major LLM surface and find the citation gaps your competitors are filling without you.

Check Your AI Visibility

llms.txt for AI Search: What It Is, Who Reads It, and How to Get It Right

The Anatomy of an llms.txt File

How Does llms.txt Differ from robots.txt?

Why the Adoption Data Should Give You Pause

Who Actually Reads llms.txt?

How to Create a llms.txt File That Actually Works

The Honest Measurement Problem

llms.txt in a Broader AI Visibility Strategy

Frequently Asked Questions

About the Author

Why AI Citations Are the New Ads in 2026: A Generative Engine Optimization Guide

What Is Topical Authority for AI Search? A 2026 Guide

How to Rank in ChatGPT: What the Evidence Actually Shows

How Perplexity Chooses Its Sources, and How to Get Cited

What Is Generative Engine Optimization (GEO)? A 2026 Guide

How to Get Your Content Cited in Perplexity in 5 Repeatable Steps

llms.txt for AI Search: What It Is, Who Reads It, and How to Get It Right

The Anatomy of an llms.txt File

How Does llms.txt Differ from robots.txt?

Why the Adoption Data Should Give You Pause

Who Actually Reads llms.txt?

How to Create a llms.txt File That Actually Works

The Honest Measurement Problem

llms.txt in a Broader AI Visibility Strategy

Frequently Asked Questions

About the Author

Related articles

Why AI Citations Are the New Ads in 2026: A Generative Engine Optimization Guide

What Is Topical Authority for AI Search? A 2026 Guide

How to Rank in ChatGPT: What the Evidence Actually Shows

How Perplexity Chooses Its Sources, and How to Get Cited

What Is Generative Engine Optimization (GEO)? A 2026 Guide

How to Get Your Content Cited in Perplexity in 5 Repeatable Steps