Free tool · No signup

Free Robots.txt Validator

Enter your domain and we fetch and lint your robots.txt: syntax mistakes, site-blocking rules, missing sitemap — plus an access matrix showing exactly which AI and search crawlers can reach your content.

Loading…

Free · Unlimited checks · No signup required

How it works

Step 1

Enter your domain

Just the domain — we fetch /robots.txt for you.

Step 2

We lint the file

Orphaned rules, unknown directives, site-wide blocks, oversized files, missing sitemap.

Step 3

AI crawler matrix

Allowed or blocked, per crawler: GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, Googlebot, Bingbot.

Step 4

Fix with confidence

Each verdict explains the real-world consequence, so you know what to change and why.

Why it matters

One robots.txt line can erase you from AI answers.

Robots.txt is the first file every crawler reads, and well-behaved AI crawlers — GPTBot, ClaudeBot, PerplexityBot — honor it strictly. A single disallow line, often copied from a template or left over from a staging setup, silently removes your entire site from ChatGPT, Claude, and Perplexity answers. Nothing errors; you just stop being cited.

Blocking Google-Extended is not the same as blocking Googlebot.

Google-Extended is a training opt-out: blocking it keeps your content out of Gemini model training without touching your Google Search rankings. Blocking Googlebot removes you from Google Search entirely. Sites regularly confuse the two — and the difference is your entire organic channel. The access matrix shows each verdict separately so the distinction is explicit.

Robots.txt errors fail silently — validation is the only way to catch them.

Crawlers don't report robots.txt problems back to you. Rules placed before any User-agent line are ignored, unknown directives are skipped, and anything past the ~500KB read limit never gets parsed. The file can look reasonable to a human while behaving completely differently to a crawler. Linting against actual crawler behavior is the only reliable check.

With Meev

Meev makes sure being crawlable turns into being cited.

An open robots.txt gets crawlers in the door — it doesn't earn citations. Meev auto-publishes articles structured for AI-engine extraction and tracks your brand across every major AI search surface, so you can see whether the access you've granted is actually converting into visibility.

  • Auto-published, citation-ready articles on your own domain
  • Visibility tracking across every major AI search surface
  • Spot when a technical change quietly drops you out of AI answers

Frequently asked

My site has no robots.txt — is that a problem?

Not by itself. No robots.txt means every crawler is allowed everywhere, which is a perfectly valid default. You do lose two things: the ability to declare your sitemap location, and any control over AI training crawlers. Most sites benefit from at least a minimal file with a Sitemap line.

Which AI crawlers does the access matrix check?

Seven crawler tokens: GPTBot (ChatGPT answers and training), OAI-SearchBot (ChatGPT search citations), ClaudeBot (Claude), PerplexityBot (Perplexity), Google-Extended (Gemini training opt-out), Googlebot (Google Search), and Bingbot (Bing, whose index also feeds several AI surfaces). Each gets its own allowed/blocked verdict with the consequence spelled out.

What's the difference between GPTBot and OAI-SearchBot?

GPTBot fetches content for ChatGPT's answers and model training; OAI-SearchBot is the dedicated crawler behind ChatGPT's live search citations. They're separate tokens with separate rules — you can allow search citations while opting out of training, or vice versa. Blocking both makes you invisible to ChatGPT entirely.

Does blocking Google-Extended hurt my Google rankings?

No. Google-Extended only controls whether your content is used for Gemini training and grounding — it has no effect on Google Search crawling, indexing, or rankings. Googlebot is the token that matters for Search. This is one of the most common robots.txt misconceptions.

What counts as an error versus a warning?

Errors are rules that actively break crawling: directives placed before any User-agent line (crawlers ignore them), or a site-wide Disallow: / under the wildcard group. Warnings are things that cost you opportunity without breaking anything: unknown directives, a missing Sitemap line, or a file so large crawlers stop reading it.

How do crawlers pick which rules apply to them?

Each crawler looks for the group whose User-agent value matches its own token most specifically — an exact GPTBot group beats the wildcard (*) group, and the wildcard only applies when no specific group exists. Within the matching group, the longest matching path rule wins, with Allow beating Disallow on ties.

Stop fixing pages one at a time.

Meev tracks your visibility across every major AI search surface and publishes quality-gated content that earns citations — automatically.

Card required, no charge until day 8. Cancel anytime.

More free tools

View all →