9 min read

How to Crawl Your Website for Broken Links (Step-by-Step Guide)

By Jason Gilmore
crawl site for broken links crawl website for broken links broken link crawler website crawler find broken links 404 error crawler link checker
Learn how to crawl your website for broken links using free tools, command line utilities, and dedicated broken link checkers. Step-by-step instructions for each method with pros and cons.

TL;DR: Crawling your website for broken links involves using automated tools that systematically visit every page, follow every link, and report any URLs returning error status codes. You can do this manually with browser extensions, use free tools like Google Search Console, or automate the process with dedicated broken link checkers like SecurityBot.

Links break constantly. Pages get deleted, URLs change during redesigns, and external sites disappear without warning. On a site with more than a few dozen pages, manually checking every link is impractical. You need to crawl your site systematically to find broken links before your visitors and search engines do.

Why You Need to Crawl for Broken Links

Links break over time through no fault of your own. You delete a blog post and forget to set up a redirect. An external site you linked to shuts down. A CMS update changes your URL structure. A colleague removes a page without telling anyone.

Manual checking doesn't scale beyond a handful of pages. A site with 100 pages might have 500 or more internal and external links. Checking each one by hand would consume hours, and you'd need to repeat the process regularly because new links break all the time.

Broken links accumulate silently. You won't notice a broken link in the middle of a two-year-old blog post until a visitor reports it, a search engine crawls it, or you run an automated scan. By the time you find out, the damage to user experience and SEO has already been done.

Method 1: Google Search Console (Free)

Google Search Console reports crawl errors that Googlebot encountered while indexing your site. This gives you direct insight into what Google sees when it crawls your pages.

Step-by-Step Instructions

  1. Log into Google Search Console and select your property.
  2. Navigate to "Pages" in the left sidebar (this was formerly called "Coverage").
  3. Look for pages categorized as "Not found (404)" under the "Why pages aren't indexed" section.
  4. Click on the 404 category to see the list of affected URLs.
  5. Click "Export" to download the full list as a CSV or Google Sheets file.
  6. For each broken URL, check the "Referring page" column to see which pages link to it.

Limitations

Google Search Console only shows pages that Googlebot attempted to crawl. If a broken internal link exists on a page that Google hasn't crawled recently, it won't appear in this report. The data is also delayed, typically reflecting crawl activity from the past few days to weeks rather than real-time status. It doesn't provide a complete picture of all internal broken links on your site.

Method 2: Browser Extensions (Free, Manual)

Browser extensions check links on individual pages as you visit them. They're useful for spot-checking important pages.

Check My Links (Chrome)

Install Check My Links from the Chrome Web Store. Navigate to any page on your site and click the extension icon. It will check every link on the page and highlight broken ones in red, valid ones in green. The results panel shows the HTTP status code for each broken link.

This approach is best for spot-checking your homepage, key landing pages, and navigation links. After publishing a new blog post or making content updates, run Check My Links on the affected pages to verify all links work.

Limitations

Browser extensions only check one page at a time. You have to manually visit every page you want to check, which makes comprehensive site audits extremely tedious. They don't crawl your entire site automatically and they don't monitor for new broken links over time.

Method 3: Command Line Tools (Free, Technical)

If you're comfortable with the terminal, command line tools can crawl your site and report broken links without installing any software beyond what's likely already on your system.

Using wget to Find Broken Links

wget --spider -r -nd -nv -l 3 -o broken_links.log https://yoursite.com
grep -B 2 '404' broken_links.log

The --spider flag tells wget to check links without downloading content. The -r flag enables recursive crawling, -l 3 limits the crawl depth to 3 levels, and -o saves the output to a log file. After the crawl completes, grep the log for 404 responses.

Using curl for Individual Checks

curl -o /dev/null -s -w "%{http_code}" https://yoursite.com/some-page

This returns just the HTTP status code for a single URL. You can combine this with a list of URLs to check multiple pages:

while read url; do
    status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
    if [ "$status" != "200" ]; then
        echo "$status $url"
    fi
done < urls.txt

Limitations

Command line tools require technical knowledge to use and interpret results. They don't provide a visual interface, don't track which page contains the broken link (just that the destination is broken), and require manual effort to run and analyze. There's no scheduling, alerting, or historical tracking.

Method 4: Dedicated Broken Link Checker Tools (Recommended)

Dedicated broken link checkers are purpose-built to crawl websites and report broken links. They handle the complexity of following links, respecting robots.txt, managing crawl rates, and presenting results in a useful format.

What to Look for in a Broken Link Crawler

Crawl depth determines how many pages the tool can scan in a single crawl. Sites with hundreds of pages need a tool that can handle the volume without hitting arbitrary limits.

Scheduling allows the tool to run automatically on a recurring basis. Weekly automated crawls catch new broken links before they accumulate.

Reporting should show you where broken links are located (the source page), what URL is broken (the destination), and what error code was returned. CSV export makes it easy to track fixes and share reports with your team.

Configurability lets you adjust crawler behavior. Custom user agents avoid WAF blocks, and adjustable crawl delays prevent overwhelming your server.

SecurityBot Broken Link Checker

SecurityBot's Broken Link Checker is designed for indie hackers and small teams who need reliable broken link monitoring without enterprise complexity or pricing.

Key capabilities include crawling up to 2,000 pages per scan, automated weekly crawls that run without any manual intervention, on-demand manual crawls you can trigger anytime from the dashboard, configurable user agent settings to avoid WAF blocks, adjustable crawl delay to prevent rate limiting, CSV export for easy tracking and team sharing, and detection of 404, 403, 500, and other HTTP status codes.

How to Set Up Automated Crawling with SecurityBot

  1. Add your website to SecurityBot by entering your domain URL in the dashboard.
  2. Navigate to Broken Link Checker settings from your site's settings page.
  3. Enable automated weekly crawls by toggling on the broken link checker.
  4. Configure your user agent if your site uses a WAF or CDN that might block crawlers. Setting a standard browser user agent usually resolves this.
  5. Run your first manual crawl to get immediate results. Click the "Run Crawl" button on the Broken Link Checker page.
  6. Review results and export to CSV. The report shows every broken link with its source page, destination URL, and HTTP status code.

What to Do After Finding Broken Links

Prioritize by Impact

Not all broken links need to be fixed with equal urgency. Start with the ones that affect the most users and have the greatest SEO impact.

  1. Fix broken links on high-traffic pages first. Check your analytics to identify your most visited pages and fix any broken links on them.
  2. Address navigation and footer links. These appear on every page, so a single broken link in your nav multiplies across your entire site.
  3. Fix links to conversion pages. Broken links that prevent users from reaching your pricing, signup, or checkout pages have direct revenue impact.
  4. Clean up the rest. Work through remaining broken links in order of the source page's traffic and importance.

How to Fix Each Type

Internal 404: Update the link to point to the correct URL, or set up a 301 redirect from the old URL to the new location. If the content was deleted with no replacement, remove the link and rewrite the surrounding text.

External 404: Find the new URL for the resource (check if it moved), find an alternative source, or remove the link entirely. The Wayback Machine at archive.org can help you find where content may have moved.

403 Forbidden: The destination page exists but is blocking your request. This might be a gated resource that requires authentication, or a server that blocks certain user agents. Verify the link works in a browser before taking action.

500 Server Error: The destination server has an internal problem. For external links, this may resolve itself. Check again after a few days. For internal links, investigate the server-side issue causing the error.

How Often Should You Crawl?

Site Type Recommended Frequency
Blog (updates weekly) Weekly
Ecommerce (products change) Twice weekly
Static business site Monthly
After major updates Immediately

Weekly automated crawls are sufficient for most websites. They catch new broken links before they accumulate and before search engines have time to repeatedly encounter the same errors. Run additional manual crawls after deploying code changes, publishing or deleting large batches of content, migrating domains or changing URL structures, and updating your CMS or site framework.

Frequently Asked Questions

How long does a broken link crawl take?

It depends on your site's size and the configured crawl delay. SecurityBot crawls up to 2,000 pages per scan, typically completing in 10 to 30 minutes. Smaller sites finish faster. The crawl delay setting (which controls the pause between requests) affects total time but prevents overwhelming your server.

Will crawling my site affect performance?

With proper crawl delays, the impact is negligible. SecurityBot configures sensible defaults automatically, and you can increase the delay if you notice any performance impact. A well-configured crawler makes requests at a rate far lower than normal user traffic.

Can I crawl password-protected pages?

Most crawlers, including SecurityBot, only access publicly available pages. Password-protected areas, pages behind login walls, and authenticated API endpoints won't be included in the crawl. For these areas, you'll need manual testing or specialized tools that support authentication.

What's the difference between a spider and a crawler?

In practice, these terms are used interchangeably. Both refer to automated programs that systematically visit web pages and follow links. "Spider" is the older term (from the idea of traversing a web), while "crawler" is more commonly used today. Google calls its crawler "Googlebot."


Automate your broken link monitoring with SecurityBot. Crawl up to 2,000 pages weekly with no manual effort. Start Free Trial.


Last updated: February 2026 | Written by Jason Gilmore, Founder of SecurityBot

Published on February 1, 2026 by Jason Gilmore
Share: