Announcement

Collapse
No announcement yet.

Checking stats... and how to block bad bots

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Checking stats... and how to block bad bots

    Hi all

    I'm currently using Actinics own hosting but the bandwidth seems to be increasing at an alarming rate, compared to teh amount of sales we're getting. Am convinced it's bots crawling the site and sucking bandwidth.

    The Plesk control panels doesn't seem to show anything useful in terms of what's eating the bandwidth.

    Any suggestions as to what/how I can get some meaningful stats?

    Once that's done... how can I block unwanted bots.

    Is the .htaccess file the best method? I've no idea what to add to it TBH, so any advice gratefully received.

    Thanks

    Chris

    #2
    A robots.txt file can be configured to stop all the 'conforming' bots.

    Comment


      #3
      Originally posted by drounding View Post
      A robots.txt file can be configured to stop all the 'conforming' bots.
      Bit of a noob at this... could you expand on this. I know what a robots.txt file is bit not what I should be adding to it...

      Comment


        #4
        Originally posted by mac-man View Post
        Bit of a noob at this... could you expand on this. I know what a robots.txt file is bit not what I should be adding to it...
        Try this: http://tinyurl.com/39pyv26

        Comment


          #5
          LOL.... v. amusing.

          :-)

          What about bots that ignore robots.txt?

          Wouldn't a block via .htaccess be more effective?

          I've finally managed to get some stats off the site.

          Am guessing it's the 'total sites' section (below). Shame I can only get top 30 via Plesk. Do these look like large numbers in terms of bandwidth? The 6th figure from is the amount of download/bandwidth.

          # Hits Files KBytes Visits Hostname
          1 11144 0.78% 3980 0.32% 54187 0.49% 96 0.35% inetgw-63-pri.nhs.uk
          2 8492 0.59% 7372 0.60% 172112 1.56% 20 0.07% b3091177.crawl.yahoo.net
          3 3683 0.26% 1202 0.10% 15869 0.14% 8 0.03% cpc2-lee210-2-0-cust1011.7-1.cable.virginmedia.com
          4 3430 0.24% 1105 0.09% 9751 0.09% 2 0.01% host81-157-46-49.range81-157.btcentralplus.com
          5 3169 0.22% 2965 0.24% 87812 0.79% 9 0.03% crawl-66-249-65-220.googlebot.com
          6 2468 0.17% 2075 0.17% 17366 0.16% 31 0.11% no-dns-yet.demon.co.uk
          7 2300 0.16% 1151 0.09% 16283 0.15% 10 0.04% h137-191-225-130.gn.gov.ie
          8 2254 0.16% 2204 0.18% 71386 0.65% 7 0.03% crawl-66-249-65-119.googlebot.com
          9 2207 0.15% 2092 0.17% 66527 0.60% 20 0.07% crawl-66-249-65-194.googlebot.com
          10 2201 0.15% 2201 0.18% 2159 0.02% 79 0.29% cpc3-burn6-0-0-cust172.10-1.cable.virginmedia.com
          11 2144 0.15% 1850 0.15% 42870 0.39% 8 0.03% b3090872.crawl.yahoo.net
          12 2037 0.14% 1796 0.15% 37387 0.34% 135 0.49% crawl8.dotnetdotcom.org
          13 1912 0.13% 360 0.03% 4583 0.04% 2 0.01% host81-154-166-69.range81-154.btcentralplus.com
          14 1792 0.13% 1747 0.14% 55726 0.50% 20 0.07% crawl-66-249-65-74.googlebot.com
          15 1740 0.12% 961 0.08% 9389 0.08% 6 0.02% 5adb65e7.bb.sky.com
          16 1683 0.12% 812 0.07% 10125 0.09% 4 0.01% cpc3-sand6-0-0-cust130.wolv.cable.virginmedia.com
          17 1674 0.12% 1224 0.10% 9176 0.08% 28 0.10% 195.27.52.146
          18 1325 0.09% 1288 0.11% 37718 0.34% 5 0.02% crawl-66-249-65-244.googlebot.com
          19 1304 0.09% 1268 0.10% 12408 0.11% 4 0.01% host86-129-48-184.range86-129.btcentralplus.com
          20 1300 0.09% 1007 0.08% 13948 0.13% 4 0.01% 2706ds1-taa.0.fullrate.dk
          21 1299 0.09% 766 0.06% 7897 0.07% 9 0.03% unallocated.star.net.uk
          22 1285 0.09% 1167 0.10% 28235 0.26% 13 0.05% ukspider4.wise-guys.nl
          23 1268 0.09% 1212 0.10% 36972 0.33% 22 0.08% crawl-66-249-65-109.googlebot.com
          24 1242 0.09% 1219 0.10% 26325 0.24% 21 0.08% crawler-220.hurricane.ny.shopwiki.com
          25 1240 0.09% 1169 0.10% 32480 0.29% 2 0.01% crawl-66-249-65-235.googlebot.com
          26 1227 0.09% 934 0.08% 11407 0.10% 2 0.01% 195-241-174-238.ip.telfort.nl
          27 1204 0.08% 1147 0.09% 37873 0.34% 6 0.02% crawl-66-249-65-212.googlebot.com
          28 1181 0.08% 1181 0.10% 7486 0.07% 10 0.04% host86-137-21-91.range86-137.btcentralplus.com
          29 1176 0.08% 900 0.07% 9091 0.08% 12 0.04% gateway-302.energis.gsi.gov.uk
          30 1142 0.08% 1075 0.09% 31311 0.28% 25 0.09% crawl-66-249-65-92.googlebot.com

          Comment


            #6
            I reckon, without having seen the site that your image optimisation is horrific and that is the problem, or you have huge flash files, why not provide a URL so we can see if there is anything amiss on the site, usually the above two problems on bandwidth worries.

            Another possibility is that visitor numbers are right up, say if SEO is doing well or PPC etc. but once on the site, something's a bit crap and people are leaving, can you see visitor numbers rising exponentially also?

            Either way a URL will be the way forward to get some real help.

            Comment


              #7
              Originally posted by leehack View Post
              I reckon, without having seen the site that your image optimisation is horrific and that is the problem, or you have huge flash files, why not provide a URL so we can see if there is anything amiss on the site, usually the above two problems on bandwidth worries.

              Another possibility is that visitor numbers are right up, say if SEO is doing well or PPC etc. but once on the site, something's a bit crap and people are leaving, can you see visitor numbers rising exponentially also?

              Either way a URL will be the way forward to get some real help.
              http://www.pedigreecharm.co.uk

              http://www.amusica.co.uk

              Comment

              Working...
              X