SEO How-to, Part 9: Diagnosing Crawler Issues

Search engines must crawl and index your site before it can rank in organic search. Thus optimizing your content is pointless if search engines cannot access it.

This is the ninth installment in my “SEO How-to” series. Previous installments are:

“Part 1: Why Use It?“;
“Part 2: Understanding Search Engines“;
“Part 3: Staffing and Planning for SEO“;
“Part 4: Keyword Research Concepts,”
“Part 5: Analyzing Keyword Data,”
“Part 6: Optimizing On-page Elements“;
“Part 7: Mapping Keywords to Content“;
“Part 8: Architecture and Internal Linking.”

In “Part 2,” I discussed how search engines crawl and index content. Anything that limits crawlable pages can kill your organic search performance.

Accidental Blocking

It’s a worst-case scenario in search engine optimization: Your company has redesigned its site, and suddenly organic performance crashes. Your web analytics indicate that home page traffic is relatively stable. But product page traffic is lower, and your new browse grid pages are nowhere to be found in Google.

What happened? You likely have a crawling or indexation issue.

Bots have come a long way. The major search engines claim their bots can crawl JavaScript. That’s true to an extent. But how developers code each piece of JavaScript code determines whether search engines can access or understand the content.

Your browser is a lot more forgiving than bots. Content that renders on the screen and functions correctly in your browser may not be crawlable for bots. Examples include the inability of bots to recognize internal links (orphaning entire sections) or correctly render page content.

The most advanced bots interpret a page as humans see it in updated browsers and send the information back to the search engine to render the different states for additional content and links.

But that relies on the most advanced search bot (i) crawling your pages, (ii) identifying and triggering elements such as non-standard link coding in navigation, and (iii) assessing a page’s function and meaning.

Traditional crawling relies on HTML text and links to determine relevance and authority instantly. But advanced crawling across JavaScript, for example, can take weeks — if it happens at all.

In short, invest the time to identify and resolve the crawl blockers on your site.

Crawl Testing

Unfortunately, publicly-available tools such as DeepCrawl and Screaming Frog’s SEO Spider cannot perfectly replicate modern search bots. The tools can show negative results when a search bot might be able to access the content.

Screaming Frog’s SEO Spider is a helpful tool to identify potential crawl errors, as is DeepCrawl. Neither is foolproof, however, in perfectly replicating search bots.

The first step in testing whether search bots can crawl your entire site is to check Google’s index. In Google’s search bar, type “site:” before any URL you want to check, such as:

site:www.mysite.com/this page/

Site queries return a list of pages that Google has indexed that start with the URL string you entered. If the pages missing from your analytics are also missing from Google’s index, you could have a crawl block. However, if the pages are indexed but not driving organic traffic, you likely have a relevance or link authority issue.

You can also check indexation in Google Search Console with the “URL inspection” tool — but only one page at a time.

If the site query fails to unearth pages, try crawling your site with Screaming Frog or DeepCrawl. Let the crawler run on your site, and look for missing areas of a certain type — browse grids, product detail pages, articles.

If you don’t see holes in the crawl, your site is likely crawlable. Search bots, again, are more capable than crawler tools. If a tool can get through a site’s content, so can search bots. And problems identified in a crawler tool could be false negatives.

Also, use crawler tools in preproduction environments to identify crawl problems before launch, or at least provide an idea of what you’ll be dealing with when it goes live.

See “Part 10: Redesigns, Migrations, URL Changes.”