The SEOniverse

What is duplicate content?

Duplicate content is content deemed by a search engine's algorithm to be the same as content found in another place on the web.

Google used to penalise sites with duplicate content issues and many big-name brands' websites were devastated by this penalty.

Penalties are no longer dealt out willy-nilly, but duplicate content is still a major issue in SEO.

Why is duplicate content still a problem?

Issues might  arise for a number of different reasons including:

Guest blogging

Bad practices in guest blogging have been a contentious issue in SEO for a number of years - not least because of duplicate content.

Some bloggers will write an article on their own site and pass on the exact same post to other blogs to publish. Regardless of how excellent the post is, the search engines might not be sure of which version to rank.

Search engines look to rank quality, unique content. Duplicate content might be quality but it is not unique.

Verification pages

If a user completes an action on one page, such as entering an email address, they are often served the same content on a different url with an additional verification message such as "thanks for your email".

You have to be careful with this as the search engines will see these two pages as duplicates.

Bear in mind that content accessed by filling in forms is generally hidden to search engines, as they don't fill in boxes to browse the web - they just follow links really. But verification pages could still be accessed, if they're included in your sites XML Sitemap or linked to externally for example, so it's best to treat them as you would other duplicate content.

Spammers

Scraping (where a spammer effectively copy and pastes your site's content onto another domain) is an obvious concern. However, search engines are pretty good at identifying scraped sites and this shouldn't be an issue.

If you find you are having problems, you can report scrapers to Google.

Avoiding duplicate content issues

To avoid confusion or complications, there are a few things you can do:

Canonical tags

Implementing canonical tags on all dubious content is a highly recommended method for avoiding duplication issues. In the head section of the original page (P1), put a canonical tag pointing to itself. On the duplicated page (P2) put a canonical tag pointing to the original page. For example:

P1

<link rel="canonical" href="https://www.P1.com">

P2

<link rel="canonical" href="https://www.P1.com">

The search engine will be in no doubt that P1 is the page that should be considered. With canonical tagging , Google juice from links to P2 will be passed on to P1.

Implementing canonical tags across all your content, even if it's not duplicated, is smart practice. This is because scrapers are often lazy and will neglect to change or remove the canonicalisation.

Robots.txt

If you have serious issues, there are some more heavy-handed ways to stop Google seeing the duplicate page. The most heavy-handed is adding the duplicate url to your site's robots.txt file. This will stop most crawlers from accessing the page. Unfortunately, any links will not be passed on to the original page with this technique.

Using robots.txt was a popular method for sorting duplication issues when it was penalised, however, there are now far better ways to combat duplication.

noindex, follow

Adding the "noindex, follow" meta robots tag:

<meta name="robots" content="noindex,follow" >

to the head section of the duplicate page's html will tell the robots not to index this page to be found in search, but still to follow all the links on the page.

However, some users believe that this tag is unreliable. While Google and many leading SEOs are adamant that it works as it should, there are other options that would be safer to stick to.

301 redirects

Redirecting the duplicate page to the original page is another method that many webmaster a implement when updating content across their sites, particularly in a site-structure reshuffle. However, this won't help you if you can't access the redirect file for the duplicate pages.

How much should you worry about duplication?

As long as you're not using duplicate content to deliberately game the search engines, you shouldn't be worried about too much. If you have a couple of duplicate guest posts kicking about, it's probably going to be ok.

Just make sure you're implementing canonical tags where possible and you shouldn't run in to any major trouble.

Back to the SEO glossary →