Robots.txt is a tiny text file with enormous power — a single wrong line can make your entire website invisible to Google. It's the first file Googlebot reads when it visits your site, and it tells crawlers exactly which pages they're allowed to access. Most site owners either ignore it entirely or make dangerous mistakes with it. This guide covers exactly how robots.txt works, how to write it correctly, and the critical mistakes to avoid.
What Is Robots.txt?
Robots.txt is a plain text file stored at the root of your website — always accessible at yourdomain.com/robots.txt. It uses the Robots Exclusion Protocol, a widely-adopted standard that tells search engine crawlers (Googlebot, Bingbot, and others) which pages they have permission to crawl.
Every time Googlebot prepares to crawl your website, it first fetches your robots.txt file and reads the instructions inside. These instructions can allow or deny access to specific directories, files, or URL patterns. The crawler then respects those instructions (mostly — more on this below).
Key distinction: Robots.txt controls crawling — not indexing. Blocking a page in robots.txt prevents Googlebot from reading its content, but Google may still index the page's URL (show it in search results as an empty result) if it finds links to it elsewhere. To prevent indexing, you need a noindex meta tag on the page itself.
Robots.txt Syntax — Every Directive Explained
| Directive | What It Does | Example |
|---|---|---|
| User-agent | Specifies which crawler the following rules apply to. * means all crawlers. | User-agent: *User-agent: Googlebot |
| Disallow | Blocks the specified path from being crawled. Empty value = allow all. | Disallow: /wp-admin/Disallow: /cart/ |
| Allow | Explicitly permits a path, overriding a broader Disallow rule. | Allow: /wp-admin/admin-ajax.php |
| Sitemap | Tells all crawlers the location of your XML sitemap. | Sitemap: https://yourdomain.com/sitemap.xml |
| Crawl-delay | Requests crawlers wait N seconds between requests. Googlebot mostly ignores this — use GSC crawl rate instead. | Crawl-delay: 10 |
| # (comment) | Lines starting with # are comments — ignored by crawlers, useful for documentation. | # Block admin area |
Critical syntax rule: Disallow: (empty — nothing after the colon) means allow everything — it's the opposite of what it looks like. Disallow: / means block everything. Many site owners write an empty Disallow thinking it blocks all crawling — it does the exact opposite. Always verify your intent.
Robots.txt Examples — Good and Bad
# Standard WordPress robots.txt # Allow all crawlers, block admin and low-value pages User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ Disallow: /?s= Disallow: /search/ Disallow: /thank-you/ Allow: /wp-admin/admin-ajax.php # Sitemap location Sitemap: https://yourdomain.com/sitemap_index.xml
# DO NOT USE THIS — blocks everything User-agent: * Disallow: / # This single line makes your entire website invisible to Google
User-agent: * Disallow: /admin/ Disallow: /checkout/ Disallow: /cart/ Disallow: /account/ Disallow: /wishlist/ Disallow: /search/ Disallow: /?sort= Disallow: /?filter= Disallow: /?ref= Allow: / Sitemap: https://store.com/sitemap.xml
# Block most of /private/ but allow one public section inside it User-agent: Googlebot Disallow: /private/ Allow: /private/public-report/ # Apply different rules to Bingbot User-agent: Bingbot Disallow: /private/ Sitemap: https://yourdomain.com/sitemap.xml
What to Block — and What NOT to Block
- /wp-admin/ (WordPress admin)
- /wp-includes/ (WordPress core files)
- /cart/ and /checkout/ (e-commerce)
- /my-account/ (user login areas)
- /?s= and /search/ (internal search)
- /thank-you/ (post-conversion pages)
- /?ref= /?utm_ (tracking parameters)
- /staging/ or /dev/ (test environments)
- /login/ and /register/
- /wp-content/uploads/ (your images)
- CSS and JavaScript files
- Your homepage /
- All content you want ranked
- Your sitemap.xml
- Blog posts and pages
- Product and category pages
- /wp-admin/admin-ajax.php (AJAX)
Never block CSS and JS: If Googlebot can't access your CSS and JavaScript files, it can't render your pages visually. Google will see a broken layout and may rank your site lower. This was a common mistake when sites blocked /wp-content/ — it blocks your entire media and theme assets.
7 Robots.txt Mistakes That Kill SEO
Disallow: / under User-agent: * blocks every crawler from every page on your site. Your entire site disappears from Google within days of Googlebot reading this.Disallow: / line entirely, or replace with specific paths you actually want blocked./wp-content/ blocks your images, theme files, and plugins. Google can't render your pages properly, sees broken layouts, and may rank you lower or stop indexing your content correctly./wp-content/uploads/ or your theme assets.noindex meta tag on the page instead of (or in addition to) robots.txt blocking.Sitemap: https://yourdomain.com/sitemap.xml at the bottom of your robots.txt file. Always use the full absolute URL.Disallow: / to your robots.txt. Many site owners launch their site without unchecking this box — their entire live site is blocked from day one.How to Check Your Robots.txt Free
There are three ways to verify your robots.txt is working correctly:
1 — Seobility Free Robots.txt Checker
Go to seobility.org/sitemap-robots-checker/ and enter your domain. The tool fetches and analyzes your robots.txt file instantly — flagging any problematic directives, syntax errors, missing sitemap references, and pages being incorrectly blocked. No signup required.
2 — Google Search Console Robots.txt Tester
In Google Search Console, go to Settings → robots.txt Tester. This lets you enter any URL on your site and see whether Googlebot is currently allowed or blocked from crawling it, based on your live robots.txt. The most reliable way to test specific URLs before and after making changes.
3 — View It Directly
Visit yourdomain.com/robots.txt in your browser. This shows exactly what Googlebot sees when it fetches your file. If the page returns a 404 error, you don't have a robots.txt file at all — most crawlers treat this as "allow everything," which is usually fine.
No robots.txt = allow all: If your site returns a 404 for robots.txt, search engines treat it as permission to crawl everything. This is usually fine for most sites. You only need a robots.txt file if you specifically want to block certain sections or add your sitemap URL for non-GSC-submitted crawlers.
🔧 Check Your Robots.txt — Free Now
Seobility's free Sitemap & Robots.txt Checker analyzes your robots.txt for errors, problematic blocks, missing sitemap references, and more. No signup. Instant results.
Frequently Asked Questions
Disallow: / — blocks your entire site from Google and eliminates all rankings.noindex meta tag on the page itself. For maximum protection, use both: robots.txt to block crawling AND noindex to block indexing. Note: Google can't read the noindex tag on a page it's blocked from crawling — so for truly sensitive pages, use noindex without robots.txt blocking.