Duplicate content issue is one of the commonest problems in the World Wide Web. It happens when the same piece of content is available to the search engine bots and the user on more than one URL. As a result, search engines need to work a lot in order to remove duplicate results from their index. Also, as a result of growing spam through the use of widespread duplicate content, Google launched the Panda update which penalized sites having duplicate or near duplicate content in them. Now, it has become necessary for the webmasters to keep their site safe from any kind of duplicate content penalty applied by Google. There are several ways through which the webmasters can keep their site safe from being penalized under “duplicate content penalty” by Google.
1- Add Rel=canonical Tag
The rel=canonical link was introduced in 2012 to solve the problem of similar web documents. The rel=canonical link element lets the search engines identify a preferred version of the URL to be considered as original. As for example, if a site has 3 URL’s namely:- 
Example.com/camcorder (the original URL)
Example.com/electronics/camcorder (Duplicate URL 1)
Example.com/electronics?item=”camcorder” (Duplicate URL 2)
From the above 3 different URL’s, the same piece of information about the camcorder can be accessed. This can cause serious duplicate content issue for the main site. We can add rel=canonical tag in 2 duplicate URL’s as provided below:- 
<head>
<link rel="canonical" href=" http://www.example.com/camcorder />
</head>
Adding the above rel=canonical tag in the duplicate URL’s will tell the search engine crawlers to attribute the content of the page to the original URL this saving the site from getting penalized due to the duplicate content issue. 
2- Assign a 301 Redirect
301 redirects tell the search engines that the page has moved to another location thus passing all the link equity and value to the main site. This should be the solution when the duplicate page has backlinks and traffic coming to it. 
The 301 redirection should be provided in the .htaccess file. An example code is given below:- 
Redirect 301 / http://mt-example.com/
3- Remove the Link
In many cases the simple and the best solution is to remove the duplicate pages from your site. This will make your task and the search engine crawlers task much easier. You can remove the pages and return 404’s for them.
4- Use robots.txt or Meta robots
Another preferred way of fixing the duplicate content issue is by either using robots.txt or the Meta robots tag. 
Through robots.txt
Add the following code in order to block the search engine crawlers from accessing the duplicate content. This will ensure the duplicate content can be seen by the users but remains blocked for the search engines.
User-agent: *
Disallow:  /duplicate
Disallow: /duplicate.html
Disallow: /original/duplicate.html
Change the lines as per the file names and locations of the duplicate URL’s.
Through Meta robots tag
Meta robots tag is a header level directive that tells the search engines to index the contents of the web page as per the directives mentioned in the tag.
A simple directive like nofollow can direct the search engines to not to index the contents of the web page. An example is given below:- 
<head> 
<meta name= “ROBOTS” content= “NOINDEX, NOFOLLOW” />
</head>
5- Use Parameter Blocking
For large ecommerce sites, parameter blocking can be used as an effective solution for blocking the duplicate content. To set parameter blocking, follow the steps given below:- 
a- Log in to the Google Webmaster Tools
b- Move to “URL Parameters” located under “Crawl” tab.
c- Click on Edit and select “no” from the drop down list. A “no “indicates the presence of duplicate content in the selected URL parameter.
A word of caution: - Be 100% sure when you are using URL parameters to block similar content because it can cause the non duplicate pages to get blocked by the search engines. 
For me, the preferred options are using rel=canonical tag and the Meta robots tag. Both these options are less risky and solve the duplicate content issue effectively. 
Also See:- 

 
No comments:
Post a Comment