|
Search powered by Google |
|||||||
| Register | FAQ | Members List | Search | Today's Posts | Mark Forums Read |
![]() |
«
Previous Thread
|
Next Thread
»
|
|
|
Thread Tools | Search this Thread | Useful Thread? | Display Modes |
|
#1
|
||||
|
Opinion Regarding Robots.text Change.
Hi All,
We are taking a bit of a hit on Google due to the amount of duplicate content on our site. Therefore as part of a phased approach we want to start removing non-essential pages from the eyes of Googlebot. Basically, our first action is to disallow Googlebot from crawling/indexing any product page that is a 4th generation copy (or more). We have done some initial research and think that a use of the * wildcard function as per below should be ok... This is the proposed code we are thinking of using: User-agent: * Disallow: /copy_of_copy_of_copy_of_*.html Therefore: http://www.kjbeckett.com/acatalog/bl...red-perry.html WOULD be crawled (from http://www.kjbeckett.com/acatalog/fred-perry_p2.html). http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD be crawled (from http://www.kjbeckett.com/acatalog/mens-bags_p4.html). http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD be crawled (from http://www.kjbeckett.com/acatalog/mens-bags.html). http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD NOT be crawled (from http://www.kjbeckett.com/acatalog/messenger-bags.html). http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD NOT be crawled (from http://www.kjbeckett.com/acatalog/fred-perry.html). Do you think our usage of the * wildcard is correct? Therefore, using the examples above, would we still be crawled where we want to, and not crawled where we don’t want to? Any help would be greatly appreciated. Cheers, Liam |
|||
|
#2
|
||||
|
I'd support the intention of removing duplicates but have you thought about removing the problem by increasing the difference between the pages?
I'd have thought you could automatically include the section name in your page titles and meta description and that ( together with the existing differences) should be enough to avoid the panda duplication penalties. Especially if you avoid the copy_of naming convention. Mike |
|||
|
#3
|
|||||
|
I think you'd be far better served by getting the duplicates linking back to the master page and thus improving internal linking and removing duplication all in one. Page names like you have really are pretty poor. Get rid of the duplicate sections, they're not needed and instead have duplicate products that link back to the master page.
__________________
|
||||
|
#4
|
|||||
|
I also think if you go down the route you have mentioned, that the canonical tag would be a better idea than robots.txt. Details on Google if you have not heard of it.
__________________
|
||||
![]() |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|