Meta robots tags are among the most important control elements in technical SEO. They determine which pages make it into the search index and which links crawlers are allowed to follow. Misconfiguration can cause important content to remain invisible or unimportant pages to waste crawl budget.

This guide shows you how meta robots directives work, where they are placed, and which combinations you should use for different page types.

What Are Meta Robots Tags?

Meta robots tags are instructions for search engine crawlers. They are placed either in the HTML head of a page or in the HTTP header. Their purpose: to tell crawlers whether a URL may be indexed and whether they may follow the links on that page.

Each URL can be controlled individually. This makes meta robots tags a precise tool for on-page optimization — provided you use them deliberately.

Where Are Meta Robots Tags Placed?

There are two possible positions:

In the HTML head: The tag is inserted within the <head> section. This is the most common method and works for standard HTML pages.

In the HTTP header: Alternatively, the directive can be transmitted via the server response header. This is especially useful for file types like PDFs that don’t have an HTML head.

Most CMSs set meta robots tags automatically or provide input fields for them. You should still verify — manual errors are common.

The Four Most Important Meta Robots Directives

There are four fundamental directives you need to know:

index: The page may be added to the search index. This is the default setting — you don’t need to set it explicitly.
noindex: The page should not be indexed. It will not appear in search results. This directive must be actively set.
follow: Crawlers may follow the links on the page and crawl the linked destinations. This is also the default and doesn’t need to be specified.
nofollow: Crawlers may not follow the links. The linked URLs will not be discovered or crawled through this page. Must be actively set.

Important: index and follow are default values. If you don’t include a meta robots tag, crawlers behave as if both directives are set. You only need to intervene when you want to deviate from the default.

Common Meta Robots Tag Combinations

The four directives produce four frequently used combinations. Each has a clear use case.

Meta Robots TagMeaningWhen to use`<meta name="robots" content="index, follow">`Index URL, follow linksNot necessary — it's the default. Only useful for clarity in the code.`<meta name="robots" content="noindex, follow">`Don't index URL, follow linksFilter pages, thank-you pages, intermediate pages without search relevance — but with valuable internal links.`<meta name="robots" content="index, nofollow">`Index URL, don't follow linksRarely useful. Only when a page should rank but links should not pass authority (e.g., guestbooks).`<meta name="robots" content="noindex, nofollow">`Don't index URL, don't follow linksLogin areas, shopping carts, checkout processes, internal search pages.

When to Use “noindex, follow”

This combination is more useful than many realize. It ensures that a page doesn’t enter the index, but still functions as a link in the internal link structure.

Typical use cases:

Filter results in online stores (e.g., “shoes size 10 red”)
Pagination pages (page 2, 3, 4…)
Thank-you pages after form submissions
Intermediate pages in multi-step processes

The advantage: crawlers can still reach important URLs through these pages without duplicate or irrelevant content burdening the index.

When to Use “noindex, nofollow”

This combination blocks both indexing and the crawling of links. It is intended for areas that should neither appear in the index nor consume crawl budget.

Examples:

Login and registration pages
Shopping carts and checkout pages
Internal search result pages
Admin areas
Test environments or staging pages

The goal here is to completely remove technical pages from search engine visibility.

Common Mistakes with Meta Robots Tags

Meta robots tags are simple — yet the same mistakes happen repeatedly.

1. Important Pages Set to “noindex”

A classic case: a category or product page is accidentally set to noindex. The page disappears from the index and rankings are lost. This often happens after relaunches or due to faulty template settings.

Solution: Regularly check which pages are set to noindex — for example via Google Search Console or crawling tools like Screaming Frog.

2. Combining “noindex” and “disallow”

If a URL is blocked in robots.txt (Disallow), Google cannot read the meta robots tag in the HTML. This means the page may remain in the index despite noindex — usually just as a URL without a snippet.

Solution: Only set noindex on URLs that are not blocked in robots.txt. Or use exclusively robots.txt if you want to completely exclude URLs from crawling.

3. Too Many Pages Set to “noindex, follow”

“noindex, follow” is practical — but not useful for every filter case. Too many noindexed pages with internal links can unnecessarily strain crawl budget without delivering real value.

Solution: Use noindex, follow selectively — not across the board. Check whether pages are actually needed as link bridges.

How to Check Your Meta Robots Settings

There are several ways to verify your meta robots tags:

In the source code: Right-click → “View page source” → search for <meta name="robots".
With browser plugins: SEO extensions like “SEO Meta in 1 Click” display meta robots tags directly.
With crawling tools: Screaming Frog, Sitebulb, or Ryte crawl your website and list all meta robots tags in a clear overview.
In Google Search Console: Under “Indexing → Pages” you can see which URLs were not indexed — often with a reference to noindex.

Meta Robots vs. X-Robots-Tag

Besides the classic HTML meta tag, there is also the X-Robots-Tag, which is set in the HTTP header. It works identically but has one key advantage: it can also be used for non-HTML files like PDFs, images, or Office documents.

Example of an HTTP header:

X-Robots-Tag: noindex, nofollow

This is particularly relevant when you want to exclude downloadable files from the index without blocking them via robots.txt.

Conclusion: Using Meta Robots Tags Deliberately

Meta robots tags are a simple but powerful tool. They determine which content becomes visible and which stays in the background. Used correctly, they prevent indexing errors, conserve crawl budget, and ensure that only relevant pages appear in search results.

The key points:

index, follow is the default — you don’t need to do anything if that’s exactly what you want.
noindex, follow is suitable for pages that pass links but shouldn’t rank themselves.
noindex, nofollow blocks pages completely — ideal for login areas and technical pages.
Never combine noindex with Disallow in robots.txt — otherwise Google can’t read the tag.
Regularly check which pages are set to noindex — especially after relaunches or template changes.

If you’re unsure which setting is right for which page: crawl your website, analyze the results, and make targeted corrections. Meta robots tags are not a one-time setup — they are part of ongoing on-page maintenance.

Using Meta Robots Directives Correctly