Despite the massive size of Google’s index (revealed at 400 billion docs and counting), the volcanic surge of AI-generated content has caused it to face limitations.
This means that fewer pages on your site may actually make it onto Google’s SERPs for people to find organically.
To ensure your most important pages get indexed by search engines, it’s more important now than ever for SEOs to focus on increasing the crawl efficiency of their website.
In this post, we’ll show you how to make it happen. We’ll discuss the idea of crawl depth and go over strategies that increase the likelihood that even your less popular pages get indexed.
But first, let’s do a quick recap on how Google crawls and indexes enterprise sites.
The search engine’s process of finding and including web pages in SERPs consists of four steps:
Of the entire process, in this post, we’re concerned with the first step: crawl.
Because unless the search engine can access, discover, and crawl a website’s assets, its ability to rank for relevant keywords will be severely impacted.
However, ensuring the crawlability of the site is just the first step.
Another SEO issue to consider is getting the crawler to regularly visit less popular content within a site’s crawl budget as well, and that’s where the issue of crawl depth comes in.
The term crawl depth refers to how many pages search engine bots will access and index on a site during a single crawl.
Sites with high crawl depth see many of their pages crawled and indexed. Those with a low crawl depth would, typically, have many of their pages not crawled for long periods.
Recommended Reading: The Best Two Website Taxonomy Methods to Boost SEO
Crawl depth is often confused with page depth, but the two aren’t the same.
The term page depth defines how many clicks a user needs to reach a specific page from the homepage — using the shortest path, of course.
Using the graph above as an example, the homepage is at depth 0, right there at the top.
Pages right below it would be on depth 1, those right below, depth 2, and so on. This is how it's reflected in our Clarity UI. So, pages linked directly to the homepage are closer to the top of the site architecture.
As a rule, pages on depth 3 and lower will, generally, perform poorer in the organic search results. This is because the search engines may have issues reaching and crawling them, within the site’s allocated crawl budget.
This brings us to another issue: crawl prioritization.
We’ve discussed how search engines find, crawl, and index content. We’ve also talked about how website structure and page depth can affect the process.
But there is one other thing we must touch upon, the correlation between the site’s architecture and page authority (or to put it differently, which pages earn the most links.)
Typically, the homepage would earn the majority of a site’s links. Pages in tiers 2 and 3 earn few links, unfortunately.
However, this shouldn’t come as a surprise when you consider the content of those pages - product categories, product pages, money pages, and so on.
All of those are neither linkable assets nor pages webmasters would reference in their content often. Pages further down - blog and other informational content - can earn more links.
For example, analyzing the link profile of Ugmonk.com, a clothing store, I can see that the majority of the links point to the shop’s homepage. Granted, some of those links reference the HTTP URL but it’s still the same homepage.
Why is this so important? Well, because pages with more links pointing to them will seem more popular. As a result, they will have a higher crawl priority than pages with fewer or no links. Those pages can also often serve as entry points for crawlers.
In short, those popular pages get crawled more often, and also, provide another entry point for crawlers, after the homepage.
Recommended Reading: Homepage SEO: How to Get It Right, Finally
But what about the rest of the pages on your site? Well, that’s the problem.
Unless those other assets are linked from popular pages, their chances of being crawled regularly diminish greatly — especially since, as we mentioned, Google's index is facing limits due to the influx of AI-generated content.
Of course, it doesn’t mean that crawlers will never access those pages (although it can be true for assets very deep in the site’s architecture.) But their crawling potential might be significantly smaller.
So, what to do? How can you increase crawl efficiency and prioritize less popular content?
Here are four effective strategies to implement.
First, optimize your internal linking structure. Reduce the number of clicks required to reach pages you want to have crawled more often.
Identify opportunities to link to target pages from popular content as well. Using seoClarity’s Internal Link Analysis feature, evaluate the current links to the page. The screenshot below shows the analysis for pages in depth 3.
Note that all of those assets have only one internal link pointing to them, suggesting an opportunity to bring them higher in the site’s hierarchy with some clever tactics.
Also, use categories and tags, if possible in your CMS, to provide the additional structure a crawler could follow to discover this content.
Create and update the XML sitemap often — this acts as a directory of sorts that includes the list of all URLs you want the search engine to index, along with information about when a page has been updated last.
As a rule, search engines will crawl URLs in the sitemap more often than others, so, by keeping the sitemap fresh, you increase the chances for those target pages to get crawled.
Note: seoClarity's built-in site crawler allows you to create a sitemap based on the information it collects.
Finally, increase page speed. A fast site will reduce the time required for crawlers to access and render pages, resulting in more assets being accessed during the crawl budget.
(A quick note: seoClarity runs page speed analysis based on Lighthouse data to deliver the most relevant insights to drive your strategies.)
Broken links can significantly impede the crawl efficiency of your website by leading search engine bots into a dead end, wasting their valuable time and resources that could have been spent indexing valuable content.
This not only hinders the discovery of new and updated pages but also negatively impacts your website's overall SEO health, as it signals poor site maintenance and lowers user experience quality.
Fixing these links ensures that search engines can effectively navigate and index your site's content, which in turn helps maintain your site's visibility and ranking in search engine results pages (SERPs).
Learn how to find and fix broken internal links at scale here.
<<This post was originally published in June 2020 and has since been updated.>>