Announcement

Collapse
No announcement yet.

Bandwidth... trying to make sense of stats

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Bandwidth... trying to make sense of stats

    Hi

    My bandwidth useage seems to be getting out of control.... esp given the number of orders we get

    Here's a list of the top 10 URLs on the site in terms of bandwidth - why is '/cgi-bin/ss000001.pl' taking 50% of the bandwidth?

    Supposedly 446 visits (2.43% of visitors) last
    yet 201393 hits. There approx 60 images on the page. Page and images weigh in at 250Kb (heavy? - I don't know).

    # Hits KBytes URL
    1 201393 20.30% 6736180 52.36% /cgi-bin/ss000001.pl
    2 28996 2.92% 5835 0.05% /cgi-bin/st000001.pl
    3 12610 1.27% 398414 3.10% /acatalog/actinicextras.js
    4 12572 1.27% 102386 0.80% /acatalog/actiniccore.js
    5 10314 1.04% 752877 5.85% /acatalog/actinic.css
    6 10117 1.02% 23033 0.18% /acatalog/charm2.css
    7 6506 0.66% 17665 0.14% /favicon.ico
    8 3614 0.36% 126816 0.99% /cgi-bin/sh000001.pl
    9 1983 0.20% 604 0.00% /robots.txt
    10 1620 0.16% 30842 0.24% /cgi-bin/os000001.pl

    But here's a thing...

    Also: Here is the top 5 total sites:
    # Hits Files KBytes Visits Hostname
    1 121047 12.20% 120660 13.45% 3993880 31.04% 8 0.04% crawl-66-249-72-141.googlebot.com
    2 34175 3.44% 34155 3.81% 1090282 8.47% 3 0.02% crawl-66-249-72-202.googlebot.com
    3 32280 3.25% 32017 3.57% 1128631 8.77% 3 0.02% crawl-66-249-72-144.googlebot.com
    4 20362 2.05% 20313 2.26% 666079 5.18% 4 0.02% crawl-66-249-71-25.googlebot.com

    Does this mean the damned googlebots are sucking up 50% of my bandwidth?

    #2
    checking this months figures the stats are even more alarming... we're getting heavily spidered every month. Over 60% this month is paying Google to spider our site.

    I know it's a good thing but this is ridiculous. How can I reduce the amount it's taking...

    EDIT: just been doing a bit of googling and there's mention of changing the crawl rate.

    Would I be correct in thinking making it a lower value reduces the amount of bandwidth it's using up every month?

    Currently set at 0.5 requests per second and 2 seconds between requests

    Comment


      #3
      Originally posted by MariaV View Post
      Does this mean the damned googlebots are sucking up 50% of my bandwidth?
      The top 2 items are perl scripts the first is the search script, which comes up a lot due to the way Actinic handles some links to pages.

      Hits can be anything on a page.

      Your page size is heavy, around 100 to 120k

      If 53% of your daily figure is 6.7Mb then you have a very low usage so Googlebot looks high, this bot will look at a sample of the site checking for changes on a regular basis and the size is down to the number of pages and their size on your site.

      A live store I monitor uses about 200Mb per day of which Googlebot is around 5Mb

      Malcolm

      SellerDeck Accredited Partner,
      SellerDeck 2016 Extensions, and
      Custom Packages

      Comment


        #4
        Originally posted by MariaV View Post
        checking this months figures the stats are even more alarming... we're getting heavily spidered every month. Over 60% this month is paying Google to spider our site.

        I know it's a good thing but this is ridiculous. How can I reduce the amount it's taking...

        EDIT: just been doing a bit of googling and there's mention of changing the crawl rate.

        Would I be correct in thinking making it a lower value reduces the amount of bandwidth it's using up every month?

        Currently set at 0.5 requests per second and 2 seconds between requests
        Yes reducing the crawl rate will reduce the amount of bandwidth googlebot takes.

        Malcolm

        SellerDeck Accredited Partner,
        SellerDeck 2016 Extensions, and
        Custom Packages

        Comment


          #5
          Originally posted by malbro View Post
          The top 2 items are perl scripts the first is the search script, which comes up a lot due to the way Actinic handles some links to pages.

          Hits can be anything on a page.

          Your page size is heavy, around 100 to 120k

          If 53% of your daily figure is 6.7Mb then you have a very low usage so Googlebot looks high, this bot will look at a sample of the site checking for changes on a regular basis and the size is down to the number of pages and their size on your site.

          A live store I monitor uses about 200Mb per day of which Googlebot is around 5Mb
          bandwidth last month was 12Gb, so Google is taking 6Gb ... which is waaaaaaaaaaayyyyyyy above the figure you quoted above. Makes you wonder if Google isn't in league with the hosting companies just to get a kickback

          How much do you think I should knock back the crawl rate?

          Will also see what I can do re: images

          Comment


            #6
            Originally posted by MariaV View Post
            bandwidth last month was 12Gb, so Google is taking 6Gb ... which is waaaaaaaaaaayyyyyyy above the figure you quoted above. Makes you wonder if Google isn't in league with the hosting companies just to get a kickback

            How much do you think I should knock back the crawl rate?

            Will also see what I can do re: images
            Not quite that bad, it's 4GB but even so that is very bad. It appears from the figures that google is taking around 33K per page which is a little higher than I would expect but it does depend on the page content. The problem is the number of files that googlebot is finding.

            I would decrease the crawl rate by a factor of 10 based on the bandwidth.

            Files = 120660 13.45%

            KBytes = 3993880 31.04%

            Malcolm

            SellerDeck Accredited Partner,
            SellerDeck 2016 Extensions, and
            Custom Packages

            Comment


              #7
              Originally posted by malbro View Post
              Not quite that bad, it's 4GB but even so that is very bad. It appears from the figures that google is taking around 33K per page which is a little higher than I would expect but it does depend on the page content. The problem is the number of files that googlebot is finding.

              I would decrease the crawl rate by a factor of 10 based on the bandwidth.

              Files = 120660 13.45%

              KBytes = 3993880 31.04%
              At the moment it's set at 0.5 requests per second and 2 seconds between requests... so what should I reduce it to?

              Would a 50% reduction be enough (ie .25 and 4)? Or is that too much?

              Comment


                #8
                I would do 2 things:

                1. Let Google decide the crawl rate.

                2. As you aren't using scripts for linking I'd just block Google from the cgi-bin. There's no need for them to crawl the cgi-bin on a site unless you're using it to link to your content.

                Google are smart enough to crawl sites appropriately. In their own words:

                Google has sophisticated algorithms that determine how much to crawl each site. Our goal is to crawl as many pages from your site as we can on each visit without overwhelming your server's bandwidth
                .

                Why do you want to use a custom rate?

                Mike
                -----------------------------------------

                First Tackle - Fly Fishing and Game Angling

                -----------------------------------------

                Comment


                  #9
                  I agree with Mike, you're looking at all the wrong areas, leave Google well alone, they are not the issue here, if they love your site, leave them to love it!

                  Comment


                    #10
                    Originally posted by Mike Hughes View Post
                    I would do 2 things:

                    1. Let Google decide the crawl rate.

                    2. As you aren't using scripts for linking I'd just block Google from the cgi-bin. There's no need for them to crawl the cgi-bin on a site unless you're using it to link to your content.

                    Google are smart enough to crawl sites appropriately. In their own words:

                    .

                    Why do you want to use a custom rate?

                    Mike
                    Cause it's sucking around over 30% of my bandwidth... around 4Gb last month

                    How would I block the cgi-bin? Via my htaccess file?

                    Comment


                      #11
                      You keep focusing on them taking 4GB, focus on WHY it is 4GB.

                      Comment


                        #12
                        Originally posted by leehack View Post
                        I agree with Mike, you're looking at all the wrong areas, leave Google well alone, they are not the issue here, if they love your site, leave them to love it!
                        Great... you can pay my extra bandwidth bills then

                        It's not as if we're getting shed loads of extra sales as a result. :-(

                        Maybe I need a different hosting company...

                        Comment


                          #13
                          To block googlebot from the cgi-bin:

                          - go to google webmaster tools
                          - crawler access
                          - robots.txt
                          - create rule

                          block googlebot /cgi-bin/

                          'add rule'

                          download file and place on website.

                          Keep an eye on webmaster tools. They'll warn you if something doesn't look right.

                          Mike
                          -----------------------------------------

                          First Tackle - Fly Fishing and Game Angling

                          -----------------------------------------

                          Comment


                            #14
                            Originally posted by MariaV View Post
                            Great... you can pay my extra bandwidth bills then

                            It's not as if we're getting shed loads of extra sales as a result. :-(

                            Maybe I need a different hosting company...
                            It's genuinely 5 years since i heard of bandwidth use being a problem on a hosting account, 99% of the time when you get these problems, it is to so with the site design and the resources being used by that. Image size, script calls, humungous menus and plain bad coding are all very common reasons for this.

                            Comment


                              #15
                              Originally posted by leehack View Post
                              You keep focusing on them taking 4GB, focus on WHY it is 4GB.
                              Which is what I've trying get to the bottom of as well...

                              I could do with some pointers on reading the stats generated by plesk.

                              Apart from images, is the anything else I can do to reduce the size of the page?

                              http://www.pedigreecharm.co.uk

                              Comment

                              Working...
                              X