Why Google Can't Index My Site: Robots.txt Troubleshooting

If Google isn't indexing your website, your robots.txt file might be blocking it. Learn how to diagnose robots.txt issues, common mistakes that prevent crawling, and how to fix them without hurting your SEO.

TL;DR: If Google can't index your site, check your robots.txt file first. A single misplaced Disallow: / directive can block your entire site from search engines. Use Google Search Console's URL Inspection tool to see exactly how Googlebot interprets your robots.txt, fix any overly broad blocking rules, and set up monitoring so you catch accidental changes before they hurt your traffic.

You published great content, waited patiently, and... nothing. Your pages aren't appearing in Google search results. After checking the usual suspects, you discover Google has been blocked from crawling your site by your own robots.txt file. It happens more often than you might think, and the fix is usually straightforward once you identify the problem.

What Does "Blocked by robots.txt" Mean?

When Google reports a page as "blocked by robots.txt," it means your robots.txt file contains a directive that tells Googlebot not to crawl that URL. Google respects this instruction and doesn't access the page content, which means it can't index the page for search results. The block might be intentional (for admin pages, staging sites, or duplicate content) or accidental (from overly broad rules or copy-paste errors).

How to Check If Robots.txt Is Blocking Google

Use Google Search Console

Google Search Console is the most authoritative source because it shows exactly what Google sees.

Go to the URL Inspection tool and enter the URL you're concerned about. If the page is blocked by robots.txt, the inspection will show "Blocked by robots.txt" under indexing status. Click through to see details about which specific rule is causing the block.

The Coverage report in Search Console also lists all pages blocked by robots.txt under the "Excluded" section. This gives you a site-wide view of what's being blocked.

Test Your robots.txt File Directly

Google Search Console includes a robots.txt Tester (under the old Search Console interface). Enter any URL from your site to see if your current rules would block it.

You can also manually check your robots.txt by visiting https://yourdomain.com/robots.txt in your browser. Look for Disallow rules that might match the URLs you want indexed.

Check with Other Tools

Third-party SEO tools like Screaming Frog, Ahrefs, and Semrush can test robots.txt blocking as part of their site audits. These give you a broader view across many pages at once.

Common Robots.txt Mistakes That Block Google

Blocking Everything

The most devastating mistake is blocking all crawlers from your entire site:

User-agent: *
Disallow: /

This single rule tells every search engine crawler to stay away from every page. It's sometimes used temporarily during development and then forgotten, or it's copied from a template without understanding what it does.

The fix is simple. Remove or modify the rule:

User-agent: *
Allow: /

Blocking the Root Path by Accident

A typo or formatting error can block more than intended:

User-agent: *
Disallow: / admin/

That space between / and admin/ creates two problems. Some parsers interpret this as blocking / (everything), while others treat it as invalid. The correct syntax is:

User-agent: *
Disallow: /admin/

Overly Broad Disallow Rules

Rules that block common URL patterns can have unexpected consequences:

User-agent: *
Disallow: /*?

This blocks all URLs with query parameters. But if your site uses URLs like /blog?page=2 for pagination, all those pages become inaccessible to Google.

Be specific about what you want to block rather than using broad wildcards.

Forgetting to Update After Migration

If you migrated from a staging environment, the robots.txt might still contain staging rules:

# Staging site - block all crawlers
User-agent: *
Disallow: /

Always check robots.txt as part of your launch checklist after any migration or environment change.

Blocking CSS and JavaScript

Some outdated advice suggested blocking CSS and JavaScript files from crawlers. This is harmful today because Google needs to render your pages to understand them properly.

# Don't do this
User-agent: *
Disallow: /css/
Disallow: /js/

Remove these rules. Google explicitly recommends allowing access to CSS, JavaScript, and images.

Trailing Slash Confusion

In robots.txt, /admin blocks URLs starting with /admin, including /administrator and /admin-tools. The rule /admin/ only blocks URLs starting with /admin/ (with the slash).

If you want to block only the admin directory and its contents, use:

User-agent: *
Disallow: /admin/

Case Sensitivity

Robots.txt rules are case-sensitive. If your rule blocks /Admin/ but your actual URLs use /admin/, the block won't work. Similarly, if you're trying to block /IMAGES/ but your server uses /images/, verify the cases match.

Diagnosing Why Google Isn't Indexing

It Might Not Be Robots.txt

Before diving deep into robots.txt troubleshooting, verify it's actually the problem. Other causes of indexing issues include noindex meta tags or X-Robots-Tag headers, canonical tags pointing elsewhere, pages behind login requirements, new sites that Google hasn't discovered yet, manual actions or penalties in Search Console, and low-quality content that Google chooses not to index.

The URL Inspection tool in Search Console identifies which specific issue is affecting each URL.

Check for Conflicting Signals

Sometimes robots.txt allows a page, but a noindex tag on the page itself prevents indexing. Or the page is allowed but returns a redirect. Multiple signals can create confusing situations.

For each problem URL, check the robots.txt status (allowed/blocked), the page's meta robots tag, any X-Robots-Tag in response headers, the HTTP status code, and canonical tag settings.

Consider Crawl Budget

If your site is large, Google might not crawl everything even when it's allowed. Crawl budget issues typically affect sites with tens of thousands of pages. If you have a smaller site, crawl budget probably isn't your problem.

How to Fix Robots.txt Blocking Issues

Step 1: Identify All Blocking Rules

Review your entire robots.txt file. Look for any Disallow directives that might match the URLs you want indexed.

Remember that rules for User-agent: * apply to all crawlers, including Googlebot, unless there's a more specific User-agent: Googlebot section.

Step 2: Test Before Deploying

Before changing production robots.txt, test your changes using Google's robots.txt Tester or by setting up a staging version of your robots.txt and verifying it allows the correct URLs.

Step 3: Make Targeted Changes

Rather than completely rewriting robots.txt, make minimal changes that fix the specific problem. This reduces the risk of introducing new issues.

Step 4: Deploy and Verify

Update your production robots.txt file. Then use Search Console's URL Inspection tool to verify the changes work as expected. Note that Google caches robots.txt, so changes might not take effect immediately.

Step 5: Request Reindexing

After fixing robots.txt, use the URL Inspection tool to request indexing for important pages. Google will recrawl and, if the content is suitable, add the pages to its index.

Preventing Future Robots.txt Problems

Use Version Control

Keep your robots.txt in version control with your other code. This creates a history of changes and makes it easier to identify when and why blocking rules were added.

Review Changes Before Deployment

Treat robots.txt changes as seriously as code changes. Review them before deployment and test the implications.

Set Up Monitoring

Robots.txt monitoring alerts you when your robots.txt file changes. This catches accidental modifications, whether from bad deployments, compromised servers, or well-meaning team members who didn't understand the implications.

Document Your Rules

Add comments to robots.txt explaining why each rule exists:

# Block admin area from all crawlers - security best practice
User-agent: *
Disallow: /admin/

# Block staging content that shouldn't be indexed
Disallow: /staging/

When someone (including future you) reviews the file, they'll understand the intent behind each rule.

Include robots.txt in Launch Checklists

Whenever launching a new site or migrating an existing one, verify robots.txt is correct. This should be a mandatory checklist item along with verifying SSL, redirects, and analytics.

How SecurityBot Helps

SecurityBot's robots.txt monitoring watches your robots.txt file for changes and alerts you immediately when something changes. You get a diff showing exactly what changed, so you can quickly identify whether the change was intentional or accidental.

For sites that depend on search traffic, catching a blocking rule before Google's next crawl can prevent days or weeks of lost visibility.

Start your free 14-day trial and monitor your robots.txt along with SSL, uptime, and security headers.

Frequently Asked Questions

How long does it take for Google to reindex after fixing robots.txt?

Google typically recaches robots.txt within a day or two. After that, it will start crawling previously blocked pages on subsequent visits. Full reindexing can take days to weeks depending on your site size and how frequently Google crawls it. Requesting indexing in Search Console can speed up important pages.

Will unblocking pages hurt my SEO?

No, unblocking pages that should be indexed will improve your SEO. You're allowing Google to see content it couldn't access before. The only scenario where this could cause issues is if the newly accessible pages are duplicate content or low quality.

Should I block AI crawlers in robots.txt?

This is a business decision. If you don't want AI systems training on your content, you can block their crawlers (GPTBot, CCBot, etc.). If you want AI tools to cite your content, allow them. The decision doesn't affect Google search indexing.

Can I block Google from specific pages without blocking everything?

Yes, use specific path rules:

User-agent: *
Disallow: /private/
Disallow: /draft-content/

This blocks only the specified paths while allowing everything else.

Does Google penalize sites for blocking too much?

No, there's no penalty for having robots.txt rules. However, blocking content you want indexed is obviously counterproductive. Google only indexes what it can see.

Last updated: January 2026 | Written by Jason Gilmore, Founder of SecurityBot