What Is Robots.txt & How to Write It for SEO (2026) | Seobility

What Is Robots.txt in SEO &
How to Write It Correctly (2026)

Robots.txt is a tiny text file with enormous power — a single wrong line can make your entire website invisible to Google. It's the first file Googlebot reads when it visits your site, and it tells crawlers exactly which pages they're allowed to access. Most site owners either ignore it entirely or make dangerous mistakes with it. This guide covers exactly how robots.txt works, how to write it correctly, and the critical mistakes to avoid.

What Is Robots.txt?

Robots.txt is a plain text file stored at the root of your website — always accessible at yourdomain.com/robots.txt. It uses the Robots Exclusion Protocol, a widely-adopted standard that tells search engine crawlers (Googlebot, Bingbot, and others) which pages they have permission to crawl.

Every time Googlebot prepares to crawl your website, it first fetches your robots.txt file and reads the instructions inside. These instructions can allow or deny access to specific directories, files, or URL patterns. The crawler then respects those instructions (mostly — more on this below).

Key distinction: Robots.txt controls crawling — not indexing. Blocking a page in robots.txt prevents Googlebot from reading its content, but Google may still index the page's URL (show it in search results as an empty result) if it finds links to it elsewhere. To prevent indexing, you need a noindex meta tag on the page itself.

Robots.txt Syntax — Every Directive Explained

DirectiveWhat It DoesExample
User-agent Specifies which crawler the following rules apply to. * means all crawlers. User-agent: *
User-agent: Googlebot
Disallow Blocks the specified path from being crawled. Empty value = allow all. Disallow: /wp-admin/
Disallow: /cart/
Allow Explicitly permits a path, overriding a broader Disallow rule. Allow: /wp-admin/admin-ajax.php
Sitemap Tells all crawlers the location of your XML sitemap. Sitemap: https://yourdomain.com/sitemap.xml
Crawl-delay Requests crawlers wait N seconds between requests. Googlebot mostly ignores this — use GSC crawl rate instead. Crawl-delay: 10
# (comment) Lines starting with # are comments — ignored by crawlers, useful for documentation. # Block admin area

Critical syntax rule: Disallow: (empty — nothing after the colon) means allow everything — it's the opposite of what it looks like. Disallow: / means block everything. Many site owners write an empty Disallow thinking it blocks all crawling — it does the exact opposite. Always verify your intent.

Robots.txt Examples — Good and Bad

Standard WordPress robots.txt — Correct
# Standard WordPress robots.txt
# Allow all crawlers, block admin and low-value pages

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /?s=
Disallow: /search/
Disallow: /thank-you/
Allow: /wp-admin/admin-ajax.php

# Sitemap location
Sitemap: https://yourdomain.com/sitemap_index.xml
Dangerous robots.txt — BLOCKS ENTIRE SITE
# DO NOT USE THIS — blocks everything

User-agent: *
Disallow: /

# This single line makes your entire website invisible to Google
E-commerce site robots.txt — Correct
User-agent: *
Disallow: /admin/
Disallow: /checkout/
Disallow: /cart/
Disallow: /account/
Disallow: /wishlist/
Disallow: /search/
Disallow: /?sort=
Disallow: /?filter=
Disallow: /?ref=
Allow: /

Sitemap: https://store.com/sitemap.xml
Allow and Disallow together — Advanced
# Block most of /private/ but allow one public section inside it

User-agent: Googlebot
Disallow: /private/
Allow: /private/public-report/

# Apply different rules to Bingbot
User-agent: Bingbot
Disallow: /private/

Sitemap: https://yourdomain.com/sitemap.xml

What to Block — and What NOT to Block

❌ Block These Paths
  • /wp-admin/ (WordPress admin)
  • /wp-includes/ (WordPress core files)
  • /cart/ and /checkout/ (e-commerce)
  • /my-account/ (user login areas)
  • /?s= and /search/ (internal search)
  • /thank-you/ (post-conversion pages)
  • /?ref= /?utm_ (tracking parameters)
  • /staging/ or /dev/ (test environments)
  • /login/ and /register/
✅ Never Block These
  • /wp-content/uploads/ (your images)
  • CSS and JavaScript files
  • Your homepage /
  • All content you want ranked
  • Your sitemap.xml
  • Blog posts and pages
  • Product and category pages
  • /wp-admin/admin-ajax.php (AJAX)

Never block CSS and JS: If Googlebot can't access your CSS and JavaScript files, it can't render your pages visually. Google will see a broken layout and may rank your site lower. This was a common mistake when sites blocked /wp-content/ — it blocks your entire media and theme assets.

7 Robots.txt Mistakes That Kill SEO

1
Blocking the entire site with Disallow: /
The most catastrophic robots.txt error. A single Disallow: / under User-agent: * blocks every crawler from every page on your site. Your entire site disappears from Google within days of Googlebot reading this.
✅ Fix: Remove the Disallow: / line entirely, or replace with specific paths you actually want blocked.
2
Blocking CSS, JS, or the uploads directory
Blocking /wp-content/ blocks your images, theme files, and plugins. Google can't render your pages properly, sees broken layouts, and may rank you lower or stop indexing your content correctly.
✅ Fix: Only block specific subdirectories you intend to block. Never block /wp-content/uploads/ or your theme assets.
3
Forgetting to update after moving pages
If you blocked a URL pattern and later create important content at that path, the block remains. Many sites have old robots.txt rules blocking pages that were added years after the original file was written.
✅ Fix: Review your robots.txt every time you restructure URLs or add new page types. Test specific URLs using Google's robots.txt Tester in Search Console.
4
Assuming robots.txt keeps pages out of Google's index
Robots.txt blocks crawling — not indexing. If Google finds links to a blocked page from other sites, it may still index that URL (showing it as a content-less result in search). This surprises many site owners who think robots.txt provides privacy.
✅ Fix: For pages that must not appear in search results at all, use a noindex meta tag on the page instead of (or in addition to) robots.txt blocking.
5
Incorrect spacing or formatting breaking all rules
Robots.txt is parsed line by line. A stray space before a directive, a missing colon, or Windows-style line endings in some editors can break parsing and make crawlers ignore all your rules entirely.
✅ Fix: Use a plain text editor (not Word). No spaces before directives. Always use a colon and space after directive names. Test with Google's robots.txt Tester.
6
No Sitemap directive in robots.txt
Including your sitemap URL in robots.txt is free, takes 5 seconds, and tells every crawler where to find your pages — including crawlers you haven't submitted to in Search Console (Bing, DuckDuckGo, etc.).
✅ Fix: Add Sitemap: https://yourdomain.com/sitemap.xml at the bottom of your robots.txt file. Always use the full absolute URL.
7
Leaving WordPress development mode robots.txt live
During WordPress development, the "Discourage search engines" checkbox adds Disallow: / to your robots.txt. Many site owners launch their site without unchecking this box — their entire live site is blocked from day one.
✅ Fix: Go to Settings → Reading → confirm "Discourage search engines from indexing this site" is unchecked on every live WordPress site. Check this first when troubleshooting any indexing issue.

How to Check Your Robots.txt Free

There are three ways to verify your robots.txt is working correctly:

1 — Seobility Free Robots.txt Checker

Go to seobility.org/sitemap-robots-checker/ and enter your domain. The tool fetches and analyzes your robots.txt file instantly — flagging any problematic directives, syntax errors, missing sitemap references, and pages being incorrectly blocked. No signup required.

2 — Google Search Console Robots.txt Tester

In Google Search Console, go to Settings → robots.txt Tester. This lets you enter any URL on your site and see whether Googlebot is currently allowed or blocked from crawling it, based on your live robots.txt. The most reliable way to test specific URLs before and after making changes.

3 — View It Directly

Visit yourdomain.com/robots.txt in your browser. This shows exactly what Googlebot sees when it fetches your file. If the page returns a 404 error, you don't have a robots.txt file at all — most crawlers treat this as "allow everything," which is usually fine.

No robots.txt = allow all: If your site returns a 404 for robots.txt, search engines treat it as permission to crawl everything. This is usually fine for most sites. You only need a robots.txt file if you specifically want to block certain sections or add your sitemap URL for non-GSC-submitted crawlers.

🔧 Check Your Robots.txt — Free Now

Seobility's free Sitemap & Robots.txt Checker analyzes your robots.txt for errors, problematic blocks, missing sitemap references, and more. No signup. Instant results.

Frequently Asked Questions

What is robots.txt?
Robots.txt is a plain text file at the root of your website (yourdomain.com/robots.txt) that tells search engine crawlers which pages they can and cannot crawl. It uses directives like User-agent (which crawler), Disallow (block this path), Allow (permit this path), and Sitemap (your XML sitemap location). Googlebot reads this file before crawling any page on your site.
Does robots.txt affect Google rankings?
Yes — robots.txt directly affects rankings by controlling which pages Google can crawl and discover. Blocking important pages prevents them from being indexed and ranked. Using robots.txt to block low-value pages (admin, cart, search results) helps Google allocate crawl budget more efficiently to your content pages. The most dangerous mistake — Disallow: / — blocks your entire site from Google and eliminates all rankings.
Does robots.txt block pages from Google's index?
Robots.txt blocks crawling — not indexing. If a blocked page has external links pointing to it, Google may still index the URL (show it in search results as a bare link without content). To prevent indexing, use a noindex meta tag on the page itself. For maximum protection, use both: robots.txt to block crawling AND noindex to block indexing. Note: Google can't read the noindex tag on a page it's blocked from crawling — so for truly sensitive pages, use noindex without robots.txt blocking.
What should I block with robots.txt?
Block: /wp-admin/ (WordPress admin), /wp-includes/, /cart/ and /checkout/ (e-commerce), /my-account/ (user areas), /?s= and /search/ (internal search results), /thank-you/ pages, URL parameter variations (?sort=, ?filter=). Never block: CSS and JS files, /wp-content/uploads/ (your images), your sitemap, blog posts, product pages, or anything you want ranked.
How do I check my robots.txt for free?
Check your robots.txt free at seobility.org/sitemap-robots-checker/ — instant analysis, no signup. Also use Google Search Console → Settings → robots.txt Tester to test specific URL paths against your current robots.txt. Visit yourdomain.com/robots.txt directly to see what crawlers currently read. A 404 response means you have no robots.txt file, which Google treats as permission to crawl everything.