My SellerDeck Account | Newsletter | Free Trial

Community and Knowledge Base

  #1  
Old 10-Apr-2012, 03:13 PM
KJ Beckett KJ Beckett is offline
Registered User
Join Date: Feb 2011
Full Name: KJ Beckett
Posts: 13
Thanks: 2
Thanked 3 Times in 3 Posts
Opinion Regarding Robots.text Change.

Hi All,

We are taking a bit of a hit on Google due to the amount of duplicate content on our site. Therefore as part of a phased approach we want to start removing non-essential pages from the eyes of Googlebot.

Basically, our first action is to disallow Googlebot from crawling/indexing any product page that is a 4th generation copy (or more). We have done some initial research and think that a use of the * wildcard function as per below should be ok...

This is the proposed code we are thinking of using:

User-agent: *
Disallow: /copy_of_copy_of_copy_of_*.html

Therefore:

http://www.kjbeckett.com/acatalog/bl...red-perry.html WOULD be crawled (from http://www.kjbeckett.com/acatalog/fred-perry_p2.html).
http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD be crawled (from http://www.kjbeckett.com/acatalog/mens-bags_p4.html).
http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD be crawled (from http://www.kjbeckett.com/acatalog/mens-bags.html).
http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD NOT be crawled (from http://www.kjbeckett.com/acatalog/messenger-bags.html).
http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD NOT be crawled (from http://www.kjbeckett.com/acatalog/fred-perry.html).

Do you think our usage of the * wildcard is correct? Therefore, using the examples above, would we still be crawled where we want to, and not crawled where we donít want to?

Any help would be greatly appreciated.

Cheers,
Liam
Reply With Quote
  #2  
Old 10-Apr-2012, 08:02 PM
Mike Hughes Mike Hughes is offline
Registered User
Join Date: Jan 2003
Full Name: Mike Hughes
Posts: 7,199
Thanks: 179
Thanked 270 Times in 237 Posts
I'd support the intention of removing duplicates but have you thought about removing the problem by increasing the difference between the pages?

I'd have thought you could automatically include the section name in your page titles and meta description and that ( together with the existing differences) should be enough to avoid the panda duplication penalties. Especially if you avoid the copy_of naming convention.

Mike
Reply With Quote
  #3  
Old 10-Apr-2012, 10:53 PM
leehack's Avatar
leehack leehack is offline
Moderator
Join Date: Nov 2005
Full Name: Lee Hackett
Posts: 15,164
Thanks: 214
Thanked 478 Times in 419 Posts
I think you'd be far better served by getting the duplicates linking back to the master page and thus improving internal linking and removing duplication all in one. Page names like you have really are pretty poor. Get rid of the duplicate sections, they're not needed and instead have duplicate products that link back to the master page.
Reply With Quote
  #4  
Old 10-Apr-2012, 10:59 PM
leehack's Avatar
leehack leehack is offline
Moderator
Join Date: Nov 2005
Full Name: Lee Hackett
Posts: 15,164
Thanks: 214
Thanked 478 Times in 419 Posts
I also think if you go down the route you have mentioned, that the canonical tag would be a better idea than robots.txt. Details on Google if you have not heard of it.
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT. The time now is 06:47 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.