Pages

Friday 10 July 2015

Use canonical URLs

Set a preferred URL for your content

This article describes how you can use canonical URLs to improve link and ranking signals for content available through multiple URL structures or via syndication.
In the world of content management and online shopping systems, it's common for the same content to be accessed through multiple URLs. With content syndication, it's also easy for content to be distributed to different URLs and domains entirely. For example:
The same product page has dynamic URLs as a result of user session and/or search preference.
https://www.example.com/products?category=dresses&color=green

https://example.com/dresses/cocktail?gclid=ABCD 

https://www.example.com/dresses/green/greendress.html
Your blog system automatically saves multiple URLs as you position the same post under multiple sections.
https://blog.example.com/dresses/green-dresses-are-awesome/

https://blog.example.com/green-things/green-dresses-are-awesome/
Your server is configured to serve the same content for the www subdomain or the http protocol.
http://example.com/green-dresses

https://example.com/green-dresses

http://www.example.com/green-dresses
Content you provide on that blog forsyndication to other sites is replicated in part or in full on those domains.
https://news.example.com/green-dresses-for-every-day-155672.html (syndicated post)

https://blog.example.com/dresses/green-dresses-are-awesome/3245/ (original post)
While these systems make it more convenient to develop and distribute content, they cause some challenges when people use search engines to reach your page. For instance:
  • Consolidating link signals for the duplicate or similar content. It helps search engines to be able to consolidate the information they have for the individual URLs (such as links to them) on a single, preferred URL. This means that links from other sites to http://example.com/dresses/cocktail?gclid=ABCD get consolidated with links tohttps://www.example.com/dresses/green/greendress.html.
  • Tracking metrics for a single product/topic. With a variety of URLs, it's more challenging to get consolidated metrics for a specific piece of content.
  • Determining the URL you want people to see. You prefer people reach your green dresses product page via https://www.example.com/dresses/green/greendress.html rather thanhttps://example.com/dresses/cocktail?gclid=ABCD.
  • Addressing syndicated content. If you syndicate your content for publication on other domains, you want to consolidate page ranking to your preferred URL.
To address these issues, we recommend you define a canonical URL for content (or equivalent content) available through multiple URLs. You can do this for Google Search in a number of ways:
While we encourage you to use any of these methods, none of them are required. If you don't indicate a canonical URL, we'll identify what we think is the best version or URL.
Don't use the robots.txt file for canonicalization purposes.
Don't use the URL removal tool for canonicalization: it removes all versions of a URL from search.
Don't specify different URLs as canonical for the same page (e.g. one URL in a sitemap and a different URL for that same page using rel="canonical").

Set your preferred domain

Tell Google which version of your site's URL you prefer for your domain:
  • https://www.example.com
  • https://example.com
If you set your preferred domain as https://example.com, Google treats links to https://www.example.com exactly the same as links to https://example.com.
Read Set your preferred domain for details.

Indicate the preferred URL with the rel="canonical" link element

Suppose you want https://blog.example.com/dresses/green-dresses-are-awesome/ to be the preferred URL, even though a variety of URLs can access this content. You can indicate this to search engines as follows:
  • Mark up the canonical page and any other variants with a rel="canonical" link element.
    Add a <link> element with the attribute rel="canonical" to the <head> section of these pages: 
    <link rel="canonical" href="https://blog.example.com/dresses/green-dresses-are-awesome" />
    This indicates the preferred URL to use to access the green dress post, so that the search results will be more likely to show users that URL structure. (Note: We attempt to respect this, but cannot guarantee this in all cases.)
Avoid errors: use absolute paths rather than relative paths with the rel="canonical" link element.

Use this structure:  https://www.example.com/dresses/green/greendresss.html
Not this structure: /dresses/green/greendress.html).

Use a sitemap to indicate preferred URLs for the same content

Pick a canonical (preferred) URL for each of your pages, and tell us about your preference by submitting these canonical URLs in a sitemap.
We don't guarantee that we'll use the URLs you submit in a sitemap, but submitting one is a useful way to tell Google about the pages on your site you consider most important.

Use 301 redirects for URLs that are not canonical

Suppose your page can be reached in multiple ways:
  • https://example.com/home
  • https://home.example.com
  • https://www.example.com
It's a good idea to pick one of those URLs as your preferred (canonical) destination, and use 301 redirects to send traffic from the other URLs to your preferred URL. A server-side 301 redirect is the best way to ensure that users and search engines are directed to the correct page. The 301 status code means that a page has permanently moved to a new location.

Indicate how to handle dynamic parameters

Use Parameter Handling to tell Google about any parameters you would like ignored. Ignoring certain parameters can reduce duplicate content in Google's index, and make your site more crawlable. For example, if you specify that the parameter sessionid should be ignored, Google will consider https://www.example.com/dresses/green.php?sessionid=273749 to be the same as https://www.example.com/dresses/green.php.

Specify a canonical link in your HTTP header

If you can configure your server, you can use rel="canonical" HTTP headers to indicate the canonical URL for HTML documents and other files such as PDFs. Say your site makes the same PDF available via different URLs (for example, for tracking purposes), like this:
https://www.example.com/downloads/white-paper.pdf
https://www.example.com/downloads/partner-1/white-paper.pdf
https://www.example.com/downloads/partner-2/white-paper.pdf
https://www.example.com/downloads/partner-3/white-paper.pdf
In this case, you can use a rel="canonical" HTTP header to specify to Google the canonical URL for the PDF file, as follows:
Link: <http://www.example.com/downloads/white-paper.pdf>; rel="canonical"
Google currently supports these link header elements for Web Search only. 

Prefer HTTPS over HTTP for canonical URLs

Google prefers HTTPS pages over equivalent HTTP pages as canonical, except when there are conflicting signals such as the following:
  • The HTTPS page has an invalid SSL certificate.
  • The HTTPS page contains insecure dependencies.
  • The HTTPS page is roboted (and the HTTP page is not).
  • The HTTPS page redirects users to or through an HTTP page.
  • The HTTPS page has a rel="canonical" link to the HTTP page.
  • The HTTPS page contains a noindex robots meta tag
Although our systems prefer HTTPS pages over HTTP pages by default, you can ensure this behavior by taking any of the following actions:
  • Add 301 or 302 redirects from the HTTP page to the HTTPS page.
  • Add a rel="canonical" link from the HTTP page to the HTTPS page.
  • Implement HSTS.
To prevent Google from incorrectly making the HTTP page canonical, you should avoid the following practices:
  • Bad SSL certificates and HTTPS-to-HTTP redirects cause us to prefer HTTP very strongly. Implementing HSTS cannot override this strong preference.
  • Including the HTTP page in your sitemap or hreflang entries rather than the HTTPS version.
  • Implementing your SSL/TLS certificafe for the wrong host-variant: for example, example.com serving the certificate for www.example.com.  The certificate must match your complete site URL, or be a wildcard certificate that can be used for multiple subdomains on a domain.

No comments:

Post a Comment