Robots.txt is a plain text file placed at the root of your website (yourdomain.com/robots.txt) that tells search engine crawlers which pages or sections of your site they are allowed or not allowed to crawl. It uses the Robots Exclusion Protocol — a set of directives like User-agent (which crawler), Disallow (block this path), Allow (permit this path), and Sitemap (location of your XML sitemap). Robots.txt is the first file Googlebot reads when it visits your site.

Does robots.txt block pages from Google index?

Robots.txt blocks crawling — not indexing. This is a critical distinction. If a page is blocked by robots.txt but has external links pointing to it, Google may still index that page (show it in search results) without crawling it — just without any page content. To prevent both crawling and indexing, use robots.txt to block crawling AND add a noindex meta tag on the page. To prevent indexing only (while allowing crawling), use only the noindex meta tag.

What is the correct robots.txt syntax?

Robots.txt syntax: User-agent specifies which crawler the rules apply to (* means all crawlers, Googlebot means only Google). Disallow specifies paths to block (Disallow: /wp-admin/ blocks that directory). Allow overrides a Disallow for a specific path within a blocked directory. Sitemap provides your XML sitemap URL. Each User-agent block applies until the next User-agent directive. Paths must start with / and are case-sensitive. Empty Disallow: (nothing after colon) means allow everything — this is a common syntax error when people mean to disallow all.

What Is Robots.txt in SEO? How to Write Robots.txt (2026 Guide)

Q: Does robots.txt affect SEO?

Yes — robots.txt directly affects SEO by controlling which pages search engines can crawl. Blocking important pages with Disallow prevents them from being indexed and ranked. Conversely, using robots.txt to block low-value pages (admin, cart, search results) helps Google allocate its crawl budget more efficiently to your important content. The most dangerous robots.txt mistake is accidentally blocking your entire site with Disallow: / — this makes your whole site invisible to Google.

Q: What should I block with robots.txt?

Block with robots.txt: WordPress admin (/wp-admin/), checkout and cart pages (/cart/, /checkout/), internal search results (/search/, /?s=), thank-you and confirmation pages (/thank-you/), staging or test directories, duplicate content parameter URLs. Do NOT block: CSS and JavaScript files (Google needs these to render your pages), your images directory (/wp-content/uploads/), your sitemap, or any pages you want indexed and ranked.

Robots.txt is a tiny text file with enormous power — a single wrong line can make your entire website invisible to Google. It's the first file Googlebot reads when it visits your site, and it tells crawlers exactly which pages they're allowed to access. Most site owners either ignore it entirely or make dangerous mistakes with it. This guide covers exactly how robots.txt works, how to write it correctly, and the critical mistakes to avoid.

What Is Robots.txt?

Robots.txt is a plain text file stored at the root of your website — always accessible at yourdomain.com/robots.txt. It uses the Robots Exclusion Protocol, a widely-adopted standard that tells search engine crawlers (Googlebot, Bingbot, and others) which pages they have permission to crawl.

Every time Googlebot prepares to crawl your website, it first fetches your robots.txt file and reads the instructions inside. These instructions can allow or deny access to specific directories, files, or URL patterns. The crawler then respects those instructions (mostly — more on this below).

Key distinction: Robots.txt controls crawling — not indexing. Blocking a page in robots.txt prevents Googlebot from reading its content, but Google may still index the page's URL (show it in search results as an empty result) if it finds links to it elsewhere. To prevent indexing, you need a noindex meta tag on the page itself.

Robots.txt Syntax — Every Directive Explained

Directive	What It Does	Example
User-agent	Specifies which crawler the following rules apply to. * means all crawlers.	`User-agent: *` `User-agent: Googlebot`
Disallow	Blocks the specified path from being crawled. Empty value = allow all.	`Disallow: /wp-admin/` `Disallow: /cart/`
Allow	Explicitly permits a path, overriding a broader Disallow rule.	`Allow: /wp-admin/admin-ajax.php`
Sitemap	Tells all crawlers the location of your XML sitemap.	`Sitemap: https://yourdomain.com/sitemap.xml`
Crawl-delay	Requests crawlers wait N seconds between requests. Googlebot mostly ignores this — use GSC crawl rate instead.	`Crawl-delay: 10`
# (comment)	Lines starting with # are comments — ignored by crawlers, useful for documentation.	`# Block admin area`

Critical syntax rule: Disallow: (empty — nothing after the colon) means allow everything — it's the opposite of what it looks like. Disallow: / means block everything. Many site owners write an empty Disallow thinking it blocks all crawling — it does the exact opposite. Always verify your intent.

Robots.txt Examples — Good and Bad

Standard WordPress robots.txt — Correct

# Standard WordPress robots.txt
# Allow all crawlers, block admin and low-value pages

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /?s=
Disallow: /search/
Disallow: /thank-you/
Allow: /wp-admin/admin-ajax.php

# Sitemap location
Sitemap: https://yourdomain.com/sitemap_index.xml

Dangerous robots.txt — BLOCKS ENTIRE SITE

# DO NOT USE THIS — blocks everything

User-agent: *
Disallow: /

# This single line makes your entire website invisible to Google

E-commerce site robots.txt — Correct

User-agent: *
Disallow: /admin/
Disallow: /checkout/
Disallow: /cart/
Disallow: /account/
Disallow: /wishlist/
Disallow: /search/
Disallow: /?sort=
Disallow: /?filter=
Disallow: /?ref=
Allow: /

Sitemap: https://store.com/sitemap.xml

Allow and Disallow together — Advanced

# Block most of /private/ but allow one public section inside it

User-agent: Googlebot
Disallow: /private/
Allow: /private/public-report/

# Apply different rules to Bingbot
User-agent: Bingbot
Disallow: /private/

Sitemap: https://yourdomain.com/sitemap.xml

What to Block — and What NOT to Block

❌ Block These Paths

/wp-admin/ (WordPress admin)
/wp-includes/ (WordPress core files)
/cart/ and /checkout/ (e-commerce)
/my-account/ (user login areas)
/?s= and /search/ (internal search)
/thank-you/ (post-conversion pages)
/?ref= /?utm_ (tracking parameters)
/staging/ or /dev/ (test environments)
/login/ and /register/

✅ Never Block These

/wp-content/uploads/ (your images)
CSS and JavaScript files
Your homepage /
All content you want ranked
Your sitemap.xml
Blog posts and pages
Product and category pages
/wp-admin/admin-ajax.php (AJAX)

Never block CSS and JS: If Googlebot can't access your CSS and JavaScript files, it can't render your pages visually. Google will see a broken layout and may rank your site lower. This was a common mistake when sites blocked /wp-content/ — it blocks your entire media and theme assets.

7 Robots.txt Mistakes That Kill SEO

Blocking the entire site with Disallow: /

The most catastrophic robots.txt error. A single Disallow: / under User-agent: * blocks every crawler from every page on your site. Your entire site disappears from Google within days of Googlebot reading this.

✅ Fix: Remove the Disallow: / line entirely, or replace with specific paths you actually want blocked.

Blocking CSS, JS, or the uploads directory

Blocking /wp-content/ blocks your images, theme files, and plugins. Google can't render your pages properly, sees broken layouts, and may rank you lower or stop indexing your content correctly.

✅ Fix: Only block specific subdirectories you intend to block. Never block /wp-content/uploads/ or your theme assets.

Forgetting to update after moving pages

If you blocked a URL pattern and later create important content at that path, the block remains. Many sites have old robots.txt rules blocking pages that were added years after the original file was written.

✅ Fix: Review your robots.txt every time you restructure URLs or add new page types. Test specific URLs using Google's robots.txt Tester in Search Console.

Assuming robots.txt keeps pages out of Google's index

Robots.txt blocks crawling — not indexing. If Google finds links to a blocked page from other sites, it may still index that URL (showing it as a content-less result in search). This surprises many site owners who think robots.txt provides privacy.

✅ Fix: For pages that must not appear in search results at all, use a noindex meta tag on the page instead of (or in addition to) robots.txt blocking.

Incorrect spacing or formatting breaking all rules

Robots.txt is parsed line by line. A stray space before a directive, a missing colon, or Windows-style line endings in some editors can break parsing and make crawlers ignore all your rules entirely.

✅ Fix: Use a plain text editor (not Word). No spaces before directives. Always use a colon and space after directive names. Test with Google's robots.txt Tester.

No Sitemap directive in robots.txt

Including your sitemap URL in robots.txt is free, takes 5 seconds, and tells every crawler where to find your pages — including crawlers you haven't submitted to in Search Console (Bing, DuckDuckGo, etc.).

✅ Fix: Add Sitemap: https://yourdomain.com/sitemap.xml at the bottom of your robots.txt file. Always use the full absolute URL.

Leaving WordPress development mode robots.txt live

During WordPress development, the "Discourage search engines" checkbox adds Disallow: / to your robots.txt. Many site owners launch their site without unchecking this box — their entire live site is blocked from day one.

✅ Fix: Go to Settings → Reading → confirm "Discourage search engines from indexing this site" is unchecked on every live WordPress site. Check this first when troubleshooting any indexing issue.

How to Check Your Robots.txt Free

There are three ways to verify your robots.txt is working correctly:

1 — Seobility Free Robots.txt Checker

Go to seobility.org/sitemap-robots-checker/ and enter your domain. The tool fetches and analyzes your robots.txt file instantly — flagging any problematic directives, syntax errors, missing sitemap references, and pages being incorrectly blocked. No signup required.

2 — Google Search Console Robots.txt Tester

In Google Search Console, go to Settings → robots.txt Tester. This lets you enter any URL on your site and see whether Googlebot is currently allowed or blocked from crawling it, based on your live robots.txt. The most reliable way to test specific URLs before and after making changes.

3 — View It Directly

Visit yourdomain.com/robots.txt in your browser. This shows exactly what Googlebot sees when it fetches your file. If the page returns a 404 error, you don't have a robots.txt file at all — most crawlers treat this as "allow everything," which is usually fine.

No robots.txt = allow all: If your site returns a 404 for robots.txt, search engines treat it as permission to crawl everything. This is usually fine for most sites. You only need a robots.txt file if you specifically want to block certain sections or add your sitemap URL for non-GSC-submitted crawlers.

🔧 Check Your Robots.txt — Free Now

Seobility's free Sitemap & Robots.txt Checker analyzes your robots.txt for errors, problematic blocks, missing sitemap references, and more. No signup. Instant results.

Check Robots.txt Free → Full Site SEO Audit

Frequently Asked Questions

What is robots.txt?

Robots.txt is a plain text file at the root of your website (yourdomain.com/robots.txt) that tells search engine crawlers which pages they can and cannot crawl. It uses directives like User-agent (which crawler), Disallow (block this path), Allow (permit this path), and Sitemap (your XML sitemap location). Googlebot reads this file before crawling any page on your site.

Does robots.txt affect Google rankings?

Yes — robots.txt directly affects rankings by controlling which pages Google can crawl and discover. Blocking important pages prevents them from being indexed and ranked. Using robots.txt to block low-value pages (admin, cart, search results) helps Google allocate crawl budget more efficiently to your content pages. The most dangerous mistake — Disallow: / — blocks your entire site from Google and eliminates all rankings.

Does robots.txt block pages from Google's index?

Robots.txt blocks crawling — not indexing. If a blocked page has external links pointing to it, Google may still index the URL (show it in search results as a bare link without content). To prevent indexing, use a noindex meta tag on the page itself. For maximum protection, use both: robots.txt to block crawling AND noindex to block indexing. Note: Google can't read the noindex tag on a page it's blocked from crawling — so for truly sensitive pages, use noindex without robots.txt blocking.

What should I block with robots.txt?

Block: /wp-admin/ (WordPress admin), /wp-includes/, /cart/ and /checkout/ (e-commerce), /my-account/ (user areas), /?s= and /search/ (internal search results), /thank-you/ pages, URL parameter variations (?sort=, ?filter=). Never block: CSS and JS files, /wp-content/uploads/ (your images), your sitemap, blog posts, product pages, or anything you want ranked.

How do I check my robots.txt for free?

Check your robots.txt free at seobility.org/sitemap-robots-checker/ — instant analysis, no signup. Also use Google Search Console → Settings → robots.txt Tester to test specific URL paths against your current robots.txt. Visit yourdomain.com/robots.txt directly to see what crawlers currently read. A 404 response means you have no robots.txt file, which Google treats as permission to crawl everything.

What Is Robots.txt in SEO &How to Write It Correctly (2026)