Poor Search Engine Rankings Caused by Duplicate Content Issues

You are here: Articles > Poor Search Engine Rankings Caused by Duplicate Content Issues

by Carsten Cumbrowski, July, 23 2007.

Summary

What are Canonical URLs? What is duplicate content? How does it happen? How can you detect it? What can you do about it? This is a short primer for website owners and webmasters.

Introduction

It is a problem to have identical or near duplicate versions of a webpage available at more than one URL, on one web site or many.

This can be an unintentional issue, because of site architecture etc., or something the webmaster is well aware of, such as syndicated contend and scraped/illegally copied content from other websites.

Canonical URLs

The problem with canonical URLs, which refers to the issue where the site URLs with and without "www." are both valid and return the same content, is not too much of an issue today, because many webmasters are aware of it.

Google even has a tool in its Google Webmaster Central application for webmasters to specify which of the two possible versions of the URL is the primary one; the one with or the one without "www.". The webmaster does not have to make any changes to a site code for that. He should, though, because other search engines do not provide a mechanism like Google.

Update!

Google, Yahoo and Microsoft introduced in February 2009 a new HTML attribute value that webmasters can use to reduce the issue of content duplication caused by canonical URLs to the same page on their website. It is called the "canonical tag".

See the details about the tag at my page:
- Duplicate Content and Near Duplicate Content - Canonical URLs, Content Theft (Scraping).

The solution for this is to specify one of the two versions as the "primary one" and 301 redirect requests to the other. You can accomplish this by code within the web site application or by using special ISAPI filters, such as Helicon's "ISAPI Rewrite" on Microsoft IIS web servers or specifying URL rewrites (mod rewrite) in the ."htaccess" file on Apache web servers.

Bigger concerns are things when it comes to pages with the same content but more than one URL because of the sites architecture. There are various reasons for causing duplicate URLs for the same content. I recommend consulting with a SEO firm for an evaluation of your site, if you suspect duplicate content on your own website.

- top -

Website Scrapers and Content Theft

It also became much easier for black-hat SEO to create scraper-sites with the increased popularity of RSS, the ease of syndication and aggregation and "mash-ups".

Scraper sites are sites that are thrown together as quickly and as automated as possible to either rank well directly or get users to click on contextual Ads like Google AdSense and generate revenue. The chances are high, because the Ads are the only text that often makes some sense compared to the gibberish produced by the scraper.

Another goal of a scraper site could also be to boost indirectly the ranking of a more hidden site. The scraper site simply links to that other web site from multiple pages.

Those sites are a bad experience for the user in most cases and Search Engines try the get rid of them in their index as good as they can. Because of this struggle, become legit webmasters more often a victim of the circumstances than they should what increases fear and mistrust between search engines and webmasters.

Duplicate Content Detection

There are some duplicate content detection tools for webmasters available. Not just to detect duplicates on your own site, but also stolen content from your site by scrapers and other webmasters. One popular and free web based tool is Copyscape.com.

- top -

Legal Steps against Content Theft

There is not very much you can do about scraped and stolen content from your site, but it makes sense in some of the cases to have your lawyer send a DMCA notice to the copyright infringing webmaster and also his hosting provider (if known).

You can download the details of the federal Digital Millennium Copyright Act (DMCA) at the following URL at Loc.gov: http://www.loc.gov/copyright/legislation/dmca.pdf

Conclusion

For more information about duplicate content issues, including articles, guides, how-to's, search engine papers on duplicate content detection and tools to detect duplicate content (including scrapers and content theft), check out my duplicate content resources.

- top -

< previous Article/Stub	<< Index	next Article/Stub >
Search Engine Marketing and SEO Training and Certification	Articles Index	Web Analytics To Measure Your Success!