Enterprises that manage sites for visitors in multiple languages and countries face the tedious task of informing search engines which version of a page is intended for which audience.
Thankfully, hreflang tags can be added in meta data at the page level or within an XML sitemap, allowing you to specify different language and regional targets.
As a review, an XML sitemap lists the site’s pages and key information about them for Google and other search engines. The sitemaps should be linked from the robots.txt file and submitted in Google Search Console.
In an earlier post, we covered how to create XML sitemaps, so in this post, we’ll pick up where we left off to consider multi-language and regional sites.
To solve the challenge above, I’m going to show you how to create an hreflang XML sitemap for large enterprise sites.
Table of Contents:
Ideally, your content management systems include sitemaps that automatically update as changes are made to your site. However, if your CMS doesn’t automatically update your sitemap, you can use these steps in the short term.
Then, working with your configuration or development team, you can work to implement automatic sitemaps over time.
If you have a current sitemap, start there. Crawl the URLs with a web crawler and download the Excel file to build the list of URLs to include in your sitemap which will include the distinct language and location URLs.
Remove any URLs that are not a 200-OK status code such as 404s or redirecting URLs.
For redirecting URLs, replace them with the final destination URLs if they are not already included on the list.
Next, identify other indexable URLs to include in your list of URLs for the XML sitemap.
These pages could come from a crawl of your primary website (e.g. the one targeting English in the United States), or they could be provided by your web team.
The key is to get these pages crawled in an SEO platform to confirm that they are “indexable” URLs.
Indexable URLs show a 200-OK status code and don’t have a canonical tag pointing to another page.
In the example below, a set of URLs was provided to create an XML sitemap by a site’s development team, but 52 of them redirect.
If you suspect you have URLs that are not internally linked and won't appear in a crawl, leverage an SEO keyword database to find URLs indexed and ranking in Google.
You could also export all of the indexed pages within Google Search Console to collect all the URLs hosted on your site.
Enterprise sites may have multiple content management systems that generate pages, as well as one-off pages added to the site over time. This collection step ensures that you capture as many URLs as possible to include in your sitemap.
After the crawl, you need to either replace these redirecting URLs with the final destination URL or remove them from the list.
Then, re-crawl to find your indexable list of pages you want to gain traffic.
For sites without hreflang tags, you can download XML sitemaps from the platform (these download in the .gz format, which compresses them, but you can extract them after downloading to access the .xml format.)
Then add it to your site, and submit in Google Search Console.
Hreflang tags are a cornerstone for international SEO.
There are multiple ways to add hreflang tags to your site: Adding them via an XML sitemap is a good option because you don’t have to add any code to individual pages.
The first part of this step is to indicate whether or not you have a matching URL in your targeted multiple language and location.
How do you do this?
“https://www.domain.com” “/en-us” “/widgets”
“https://www.domain.com” “/en-gb” “/widgets”
If your site does not follow a uniform structure across your unique language version and regional pages, you'll have to find another way to align pages. There should be some pattern in the URL to tip you off.
If the hreflang tags are within the code of the site, seoClarity users can pull that information to find the pages. This would assume you’re deprecating this method of including hreflang tags in HTML page tags.
Ideally, you could export a full list of pages from your site’s CMS to do the alignment.
Recommended Reading: 12 Common Hreflang Mistakes and How to Prevent Them
If you have access to your own site (or a developer site), you can test your XML sitemap without the help of a developer.
The test will ensure that your sitemap is in the right format and give you the chance to get it up and running.
Below, I'll show you how to test your XML sitemap within the popular hosting platform, Bluehost (although any hosting platform will do!).
In Bluehost, choose “File Manager” then the “Public HTML” section.
Then choose “Upload” and add your .XML file. On your test site, you’ll now be able to view it when you navigate to the file name right off the root domain.
For example: your-test-site.com/sitemap.xml.
Now you can see your XML file in all it’s glory on the web:
You can validate your hreflang sitemap at scale to check your work with web crawler and site audit technology built into an SEO platform, or with free online tools.
This will ensure you didn’t miss a space, quote, or anything else that would make the sitemap invalid. It's better to find this out before you go to your dev team to upload it to your actual site. You also want to remove any redirecting URLs, error URLs, or URLS not actually translated in the associated language.
Now that you have an hreflang sitemap, you can upload it to your site, submit to Google Search Console, and check it off the list.
Recommended Reading: Does Your Site Need Self-Referencing Hreflang Tags? Hint: It Does!
Creating effective Hreflang XML sitemaps requires dedication, but doing so is essential for sites that target multiple locations.
Collaborate with your development team so they can add an automated sitemap for your site — and have it include the last modified values too!
<<Editor's Note: This post was originally published in May 2021 and has since been updated.>>