What Crawl Budget Means — and When It Becomes a Problem
Crawl budget describes how many pages of your website Google visits and processes within a given time period. Every website gets a limited quota. For a small website with 50 pages, this doesn’t matter — Google handles that easily. For websites with 10,000 or more URLs it becomes critical: when Google spends its budget on unimportant pages, your relevant content gets crawled less frequently or not at all.
Particularly affected:
- Online stores with faceted navigation and thousands of filter combinations
- Portals and directories with dynamically generated pages
- Large blogs and magazines with extensive archives and tag pages
- Websites after a relaunch with many old URL structures
Smaller websites can also be affected if they generate unnecessarily many URLs through technical errors — for example through session IDs, parameter URLs, or a bloated sitemap.
Crawl Rate Limit vs. Crawl Demand
Google controls crawling through two factors:
Crawl Rate Limit: The maximum number of simultaneous requests Google sends to your server without overloading it. If your server is slow or responds with errors, Google automatically reduces the crawl rate.
Crawl Demand: How strongly Google wants to crawl your content. Pages that change frequently or have many backlinks are visited more often. Pages that have been unchanged for years are visited less often.
The effective crawl budget results from both factors. You can influence it by keeping your server fast (crawl rate) and directing Google specifically toward your important pages (crawl demand).
How to Identify Crawl Budget Problems
Google Search Console: Crawl Statistics
Under “Settings > Crawl Stats” you’ll find three key pieces of information:
- Total crawl requests per day: If the number suddenly drops, something is wrong
- Average response time: Values above 500 ms indicate server problems
- Crawl requests by response type: Shows how many requests hit 200, 301, 404, or 500 status codes
When Google is spending a large share of its requests on redirects, error pages, or non-canonical URLs, your budget is being wasted.
Server Log Analysis
Search Console only shows part of the picture. For a complete analysis, you need your server log files. There you can see exactly which URLs Googlebot visited, when, and how often.
Tools like the Screaming Frog Log File Analyser or JetOctopus help evaluate log files and identify patterns. Typical findings: Google visits old category pages hundreds of times while new product pages are ignored.
Indexing Delays
An indirect signal: when new or updated content only appears in the index after weeks, even though the pages are technically sound, a crawl budget problem may be the cause. Check via the URL Inspection tool in Search Console when Google last crawled the page.
Typical Causes of Crawl Budget Waste
Parameter URLs
Filter pages, sort options, and tracking parameters often generate thousands of irrelevant URLs. A single product page suddenly becomes dozens of variants:
/product?color=red/product?color=red&size=xl/product?utm_source=newsletter&utm_medium=email/product?sessionid=abc123
Each of these URLs is a separate page for Google — with identical or nearly identical content.
Faceted Navigation in Online Stores
The largest crawl budget problem in e-commerce. A store with 500 products, 10 filter options, and 5 values each can theoretically generate millions of URL combinations. Google tries to crawl all of them — and never reaches the important pages.
Soft 404 Errors
Pages that visually show an error message (“No results found”) but return HTTP status code 200. Google recognizes these as real pages and continues crawling them. Particularly common on empty category pages, expired offers, or internal search result pages with no hits.
Redirect Chains
When URL A redirects to B, B to C, and C to D, every step consumes crawl budget. With hundreds of such chains, this adds up significantly. Commonly observed after multiple relaunches or CMS changes.
Bloated XML Sitemaps
Your sitemap should contain only indexable, canonical URLs. In practice, I regularly find sitemaps with redirects, 404 pages, noindex URLs, or non-canonical variants. This signals to Google: “Please crawl all these pages” — even though most of them are irrelevant.
Other Common Causes
- Internal search pages: Every search query generates a new URL that Google can discover and crawl
- Tag and category archives: Especially in WordPress, hundreds of thin archive pages often arise
- Session IDs in URLs: Every visit generates a new URL with identical content
- Print versions and AMP duplicates: Separate URLs for print views without correct canonical tags
- Calendar pages: Endlessly clickable month and year views
How to Optimize Your Crawl Budget: Step by Step
Step 1: Clean Up robots.txt
Exclude SEO-irrelevant areas from crawling:
User-agent: *
Disallow: /search/
Disallow: /internal/
Disallow: /cart/
Disallow: /account/
Disallow: /*?sessionid=
Disallow: /*?sort=
Important: robots.txt prevents crawling, not indexing. Pages discovered via external links can still end up in the index despite Disallow. For secure removal from the index, you additionally need a noindex tag.
Step 2: Fix Soft 404 Errors
Check Search Console under “Pages” for reported soft 404 errors. Return real 404 or 410 status codes for empty or irrelevant pages. For pages with little content: either enrich with relevant content or redirect via 301 to an appropriate page.
Step 3: Resolve Redirect Chains
Analyze all 301 redirects with a crawling tool like Screaming Frog. Every chain that goes through more than one step should be updated to a direct redirect to the final destination. A goes directly to D — done.
Step 4: Clean Up XML Sitemap
Remove from your sitemap:
- URLs with 3xx status codes (redirects)
- URLs with 4xx status codes (error pages)
- URLs with noindex tag
- Non-canonical URLs
- Parameter URLs
Your sitemap should contain exclusively pages that should be indexed and return status code 200.
Step 5: Set Canonical Tags Consistently
All parameter variants and URL duplicates need a canonical tag pointing to the canonical version. This clearly signals to Google which URL is the relevant one — even if the duplicates are still crawled.
Step 6: Set Irrelevant Pages to noindex
Pages like tag archives, author archives, internal search pages, or thin category pages belong on noindex, follow. This preserves internal links while Google doesn’t waste budget indexing them.
Step 7: Optimize Internal Linking
Direct your internal linking specifically toward your most important pages. Every page you want to rank should be linked from multiple other pages. Unimportant pages don’t need prominent internal links.
Online Stores: Handling Faceted Navigation Correctly
For stores with filter navigation, I recommend a three-tier approach:
-
Define indexable filter pages: Only filter combinations with their own search volume should be indexable (e.g., “men’s running shoes size 10”). All others get a noindex tag.
-
Use AJAX-based filters: Where possible, load filter results via AJAX without changing the URL. This way no additional URLs are created at all.
-
Canonical tags to the main category: All filter combinations without their own search volume point via canonical to the parent category page.
Setting Up Monitoring
Crawl budget optimization is not a one-time measure. Set up regular monitoring:
- Monthly: Check crawl statistics in Search Console
- After every relaunch: Analyze server log files for unexpected crawling patterns
- Quarterly: Conduct a full crawl with Screaming Frog and compare against the sitemap
- Ongoing: Monitor indexing reports in Search Console — is the number of indexed pages growing or stagnating?
Checklist: Crawl Budget Optimization
- Analyzed crawl statistics in Search Console
- Evaluated server log files for Googlebot activity
- Handled parameter URLs via robots.txt or canonical tags
- Faceted navigation controlled with noindex or AJAX
- Soft 404 errors replaced with real 404/410 status codes
- Redirect chains reduced to a maximum of one step
- XML sitemap contains only indexable 200-status URLs
- Internal search pages excluded from crawling
- Tag and category archives set to noindex
- Internal linking focused on important pages
- Monitoring routine established (monthly/quarterly)
When Professional Help Makes Sense
Crawl budget optimization requires a holistic view of your website architecture. Especially for online stores with faceted navigation, portals with dynamically generated pages, or websites after a relaunch, the relationships are too complex for simple solutions. Wrong settings in robots.txt or uncontrolled use of noindex can cause more damage than the original problem.
In my work as an SEO freelancer, I regularly analyze crawl budget problems — from small business websites to large online stores. The combination of Search Console data, server log files, and a technical crawl provides a complete picture.
Google crawling the wrong pages on your website? Contact me for a technical analysis — I’ll find the causes and implement the optimizations.
Need help with the implementation?
As an SEO freelancer with over 20 years of experience, I help you implement technical SEO professionally — fair, direct, and without long-term contracts.
Über den Autor
Christian SynoradzkiSEO-Freelancer
Mehr als 20 Jahre Erfahrung im digitalen Marketing. Fairer Stundensatz, keine Vertragsbindung, direkter Ansprechpartner.