Technical SEO

How to Incrementally Move an Ecommerce Site to HTTPS

In October, Google Chrome will release version 62, which will warn website visitors with a “Not Secure” message when they type in data — such as site searches and newsletter signups — on pages without HTTPS. Chrome will issue the “Not Secure” warning to all HTTP pages in Incognito mode.

This will certainly affect ecommerce conversions. For small shops, my detailed guide makes moving to HTTPS relatively painless. But for large sites, roughly 50,000 URLs and larger, there is more risk, given Googlebot’s crawling priorities and slow re-indexing. A sound strategy is to migrate to full HTTPS incrementally and measure traffic and sales impact.

In this post, I’ll explain how to do that.

Google has sent out warnings via Google Search Console to registered sites with HTTP profiles. I have clients that moved to full HTTPS long ago. But they still received the warning.

Google Search Console recently released this message to all registered sites, stating that Chrome will show security warnings starting in October 2017 for sites that haven't migrated to HTTPS.

Google Search Console recently released this message to all registered sites, stating that Chrome will show security warnings starting in October 2017 for sites that haven’t migrated to HTTPS.

If you haven’t made the move to full HTTPS yet, you will soon be able to test whether Chrome will issue the “Not Secure” warning on your site by using Google Chrome’s Canary version, which is the beta version of Chrome, used by developers and early adopters to test the latest features.

At the time of writing, Canary is using version 62, the one supposed to introduce the warning. But, I couldn’t get the “Not Secure” warning to appear in my tests. I plan to monitor Canary, to learn when the warnings start appearing.

Use the Google Chrome Canary version to see if your site will be affected.

Use the Google Chrome Canary version to see if your site will be affected.

I scanned through the National Retail Federation’s 2017 list of top traditional retailers, and found a number of them have not made the move to full HTTPS yet, including well-known brands such as AutoZone, Nordstrom, Gap, Publix, Sears, Subway, BJs, QVC, and Saks Fifth Avenue. The delay is understandable given the risk of losing valuable search engine traffic during the move.

The main risk for large sites is that Google takes too long to re-index pages due to crawl prioritization issues.

Here is the HTTP profile from Google Search Console of one client with a few thousand pages that moved to full HTTPS.

Some sites experience quick re-indexing of pages after switching to HTTPS.

Some sites experience quick re-indexing of pages after switching to HTTPS.

Google re-indexed the HTTPS pages quickly, in roughly two weeks. But another client with over 1 million pages saw a much slower (and painful) re-indexing.  It took approximately six months.

Other sites might experience a long and painful re-indexing.

Large sites, such as this one with over 1 million pages, might experience a long and painful re-indexing.

The first client didn’t see any negative impact on SEO traffic. The second one did. This has led me to plan high-stakes migrations incrementally. Many sites, such as The Guardian and Wired, have shared their experiences with making an incremental move.

My incremental migration plan involves three phases.

  • Perform server log analysis to identify which groups of pages need to be migrated first. Prioritize pages that Googlebot crawl more often, as this will let us learn the impact quickly.
  • Incrementally update redirect maps and canonical tags to perform the actual move.
  • Track progress in Google Search Console, and in Google Analytics or similar. We need to use two profiles (HTTP and HTTPS), to ensure a drop in pages indexed (and traffic) for the HTTP profile, and a corresponding increase in indexing and traffic for HTTPS profile.

If there are any problems during each section move, we can quickly revert back.

Web Server Log Analysis

One approach I’ve used successfully to prioritize incremental migrations is to start with the lowest value pages (pages with no traffic or links), and subsequently move pages with higher value. This approach works, but it requires months to execute.

As Google Chrome will start alerting users in a month or so, we need to do the reverse. We need to migrate the pages that Googlebot picks up faster, so we can accelerate our learning. We can only get this kind of information from our web server traffic logs. “Using Server Logs to Uncover SEO Problems,” one of my previous articles, explains how to turn server logs into structured data in CSV format.

You can upload the CSV file to Google Sheets, or use Excel to create a pivot table with the page URL, and the number of Googlebot visits. You can also add an extra column with the page category to group together the most crawled page groups.

The idea is to move the most frequently crawled pages or page groups to HTTPS first because we expect them to be picked up by Google relatively quickly. We can see what impact the move has on SEO traffic, then continue the process if we see no issues.

Redirects and Canonical Rules

I addressed common migration issues in “SEO: How to Migrate an Ecommerce Site to HTTPS.” In this section, below, I will focus exclusively on the redirect and canonical changes. Refer to the previous article to double-check all the steps.

Assuming your checkout funnel is using HTTPS by default, these are the changes (for Apache servers) to force the entire site to HTTPS.

RewriteEngine On
 # This will enable the Rewrite capabilities
RewriteCond %{HTTPS} !=on
 # This checks to make sure the connection is not already HTTPS

RewriteCond %{REQUEST_URI} !(^/?checkout/.*)
 RewriteRule ^(.*)$ http://www.webstore.com/$1 [R,L]
 #This forces HTTP if the page is not in the checkout funnel

RewriteCond %{REQUEST_URI} (^/?checkout/.*)
 RewriteRule ^(.*)$ https://www.webstore.com/$1 [R,L]
 #This forces HTTPS for pages in the checkout funnel

Your existing rewrite rules would look something like above. This translates to: force any URL that is not part of the checkout process (identified by /checkout) to be an HTTP URL.

We can simply widen this rule to include other page group patterns. For example, say we want to move the women’s clothing category to HTTPS, we would do this.

RewriteEngine On
 # This will enable the Rewrite capabilities

RewriteCond %{HTTPS} !=on
 # This checks to make sure the connection is not already HTTPS

RewriteCond %{REQUEST_URI} !(^/?checkout/.*|^/?women-clothing/.*)
 RewriteRule ^(.*)$ http://www.webstore.com/$1 [R,L]
 #This forces HTTP if the page is not in the checkout funnel, or women’s clothing category

RewriteCond %{REQUEST_URI} (^/?checkout/.*|^/?women-clothing/.*)
 RewriteRule ^(.*)$ https://www.webstore.com/$1 [R,L]
 #This forces HTTPS if the page is in the checkout funnel, or women clothing category

We use the pipe (|) regular expression symbol that means “or” (match this or that). We can add more page groups by concatenating (i.e., linking) their regex patterns using pipes.

I validated that this works correctly using this handy Apache htaccess tester.

As pages are updated to HTTPS, they need to be redirected with proper canonical tags.

As pages are updated to HTTPS, they need to be redirected with proper canonical tags.

Women’s clothing pages get redirected to HTTPS, while other pages are redirected to HTTP.

As we move pages from HTTP to HTTPS, we need to update the canonical tags to reflect the new default URLs. For example https://www.webstore.com/women-clothing should have https://www.webstore.com/women-clothing as the canonical, not http://www.webstore.com/women-clothing or /women-clothing.

Tracking Progress

We need to track indexing and SEO traffic levels to the page groups that we are moving. Ideally, we should also monitor Googlebot crawling using fresh server logs. The redirects and canonicals should cause the HTTP pages to drop from the index, and get replaced by the corresponding HTTPS pages.

You can narrow down organic search traffic to a page group using the “Matching RegEx” option in Google Analytics advanced filters. This will show only the traffic to the group of pages that we are moving.

Narrow down organic search traffic to a page group using the "Matching RegEx" option in Google Analytics advanced filters.

Narrow down organic search traffic to a page group using the “Matching RegEx” option in Google Analytics advanced filters.

To track re-indexing, create a separate XML sitemap with the set of pages you are migrating, and remove those pages from your main XML sitemaps. Register, in Google Search Console, two sets of XML sitemaps: one for the HTTP profile (using HTTP URLs), and another for the HTTPS profile (using HTTPS URLs).

The XML sitemaps will show you the indexing levels of the pages.

You can switch between Search Console profiles to see HTTP pages dropping and HTTPS pages getting indexed.

If you spot errors during the move, you can quickly roll back the problem group, and diagnose the problem before proceeding further.

Problems we’ve experienced include using redirect tools provided by an ecommerce platform (for example, the Magento tool) versus using the web server redirect functionality.

More than one ecommerce client has missed basic 301 redirects — from non-www to www, or from no trailing slash to trailing slash — after the move to HTTPS. This produced duplicate content: the site was available after the move as https://sitename.com and https://www.sitename.com. Another common problem is multiple redirects. Googlebot won’t typically crawl past five redirects in a chain.

Hamlet Batista
Hamlet Batista
Bio


x