Announcement

Collapse
No announcement yet.

Search script brings server to it's knees when combined with clueless user or bot

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Search script brings server to it's knees when combined with clueless user or bot

    I posted a thread about problems with search CGI scripts on larger catalogs a while back, the thread is found at:

    http://community.actinic.com/showthr...&threadid=1103

    Unfortunately, no-one seemed to pick up on the search script problem, only a potential issue with my blocking various robots and crawlers from the site.

    Well, it raised it's ugly head again yesterday, when a user with IE6 (or a bot masquerading as IE6 on XP) issued requests for 153 searches on a catalog with 10,000+ products.

    The problem arises out of the script's memory requirements - 50Mb + in this particular case.... those of you who understand unix and virtual ram might begin joining the dots at this point.

    The first few scripts begin processing, and they generally push out their data in 2-4 seconds, depending on server load (this is an older machine we're about to phase out), and within a very short time period there are more scripts grabbing 50Mb+ of RAM (each) - before you know it, all physical RAM has been used - so the server dutifully begins using virtual RAM (swap space) - which is on hard disk.

    Now... virtual RAM is CONSIDERABLY slower than "real" RAM - so the scripts take longer and longer to process, until the machine is simply bogged down in requests to supply RAM to these search scripts - each of which is taking now minutes to process their work and even BEGIN to produce output.

    Meanwhile - the clueless searcher continues to hit his back and seach buttons - apparently, a total of 152 times (to get the 153 instances of the script I found from his IP address in the very short period of time).

    If this is a robot (which I suspect - as surely no-one is clueless enough to hit back/search 152 times - SURELY?!?!?), I'm now stuck - I can't block User agents of IE6 - that's not acceptable - I can't continually sit and watch the output of a netstat and block those IPs that threaten our systems either.

    My solution? I don't have one - I'm turning to Actinic for this. We're taking all the steps to minimalize the system load of this CGI - turning off all options possible on the search facility and we're in the process of moving the site to a newer, faster server with more RAM - but in this particular instance, I can't see that a machine with 2Gb of RAM would have coped. Add it up... 150 concurrent requests for 50Mb of RAM is 7.5Gb of RAM - all for a 10,000 product catalog.

    I originally brought this matter to the attention of Actinic almost a year ago - we've been struggling under this problem for longer than I care to think about - if ANYONE has any suggestions, I'm open to them at this stage.

    The customer has endured system upgrades and software upgrades, server downtime and added expense and effort to try and combat this.... my only step left is a dedicated server (for a catalog that's only 300Mb large?!?!) and I can't have any faith that even a dedicated server wouldn't succumb to such an onslaught of requests for large chunks of resources - unless VERY well specced machine - for a business with only 10 $30-50 dollar orders a day it's hard to justify the added expense.

    Oh... and if you have any 10,000+ product catalogs, and want a demonstration of how this can bring your servers to their knees, simply publish the URLs and I'd happily show you....

    regards

    Greg Hewitt-Long
    Web Design & Ecommerce - Affordable Web Hosting
    Free and low cost Merchant Accounts coming soon..
    NOD32 Antivirus - Reciprocal Links for Actinic Sites ONLY

    #2
    Greg

    I have relayed all this to the development team. I will let you know if there is any immediate feedback.

    Comment


      #3
      GRrrrrRRRRRRRRRRRrrRRRRR.....


      Brand new server with only 5 domains on it... much more RAM.... faster CPU..... the same site just trashed the server AGAIN by some bot grabbing ALL the ram and ALL the swap again!!

      I'm pulling my hair out on this one Chris... I'm still attempting to login and kill off all the ss00001.pl scripts to get the macine back on track, then try and diagnose which idiot is bring the server down, then I'll block their IP... this is getting tedious to the point where I'm either going to close this customer's account, or have to find them an alternative e-commerce solution. This is NOT funny... it's WELL PAST funny now.

      Web Your Business Inc.
      Web Design & Ecommerce - Affordable Web Hosting
      Free and low cost Merchant Accounts coming soon..
      NOD32 Antivirus - Reciprocal Links for Actinic Sites ONLY

      Comment


        #4
        The original release of the Unix kernel - 60Kb

        Actinic Perl scripts (compressed) - 681Kb

        As someone who pokes around in that Perl it's little wonder that users are reporting script timeouts. There's a HUGE amount of processing going on. It's just too easy to write something like $sHTML~=s/\n/<br>/g; Wow only 20 bytes of code but it's asking for a global search and replace on a page that could be hundreds of K of text. Stick in a dozen or two such lines and no wonder it's ka-boom.

        Time for V7 to move to PHP / mySQL perhaps? It would certainly make on-line stock control feasible.

        Norman
        Norman - www.drillpine.biz
        Edinburgh, U K / Bitez, Turkey

        Comment


          #5
          153 searches!!!! That sounds like a very confused shopper or more likely a robot.

          Try changing the script number to one you haven't used before to see if it will stop the robot. I'll be suprised if that works but its worth a go.

          Comment


            #6
            Been there - done that.

            I'm not able to change something store wide - this Actinic store is on the end of a very slow (256kb) DSL - it takes 5-6 hours to load fully to the server. This is not a time for experimenting, as I've been doing that for MONTHS now with the client. The tether is creaking under the strain here... we've tried so many things and told the client so many times that "this is going to improve matters" - even a server with only 5 domains and maxxed out RAM didn't help.

            The user agent yesterday was:

            "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Q312461)"

            I'm still not 100% sure it's a bot, but at time we saw 3 and 4 request per second for the script, from the same IP address - that's pretty fast clicking on the part of the human.

            This store now has 11,736 products.
            Web Design & Ecommerce - Affordable Web Hosting
            Free and low cost Merchant Accounts coming soon..
            NOD32 Antivirus - Reciprocal Links for Actinic Sites ONLY

            Comment


              #7
              Ah I see.

              The software is only supposed to support a max of 10,000 products and this is probably why. It looks like you need to move to a more appropriate e-commerce solution.

              I'd recommend cutting the number of products down.
              It's no use having 10,000+ products and 0 sales
              Better to have 1,000 products and 100 sales.

              Out of curiosity, what is the URL?

              Regards

              Richard

              Comment


                #8
                that was v4 - the "up to 10,000 products" was lifted in v5 I think.

                From: http://www.actinic.co.uk/products/catalog.htm

                Product catalogue and shopping cart
                - Supports over 10,000 products with images and product descriptions

                The store is at matelectronics.com - the actinic store links over to an IP address which only has the store on at the moment, we're in the middle of moving DNS to point at the new location - I haven't saved the new A-Name records until we're convinced we're staying on that machine.
                Web Design & Ecommerce - Affordable Web Hosting
                Free and low cost Merchant Accounts coming soon..
                NOD32 Antivirus - Reciprocal Links for Actinic Sites ONLY

                Comment


                  #9
                  if it was a human browser not waiting for a response from the server you could try adding onSubmit="return disableForm(this);" to the form and adding javascript in the page header


                  <SCRIPT LANGUAGE="JavaScript">
                  <!-- Begin
                  function disableForm(theform) {
                  if (document.all || document.getElementById) {
                  for (i = 0; i < theform.length; i++) {
                  var tempobj = theform.elements[i];
                  if (tempobj.type.toLowerCase() == "submit" || tempobj.type.toLowerCase() == "reset")
                  tempobj.disabled = true;
                  }
                  return true;
                  }
                  else {
                  return false;
                  }
                  }
                  // End -->
                  </script>

                  that will stop impatient humans clicking on the button more than once!

                  if it's robots and search engines you would need some sort of script to check that the search request is only being called from yourdomain.com

                  Comment


                    #10
                    I imagine you could use .htaccess or more efficient if you setup httpd.conf to use the rewrite engine in a similar way to preventing the leeching of images by other sites hotlinking

                    Comment


                      #11
                      Thanks for all the input on this.

                      As a result of this discussion, the team are actively exploring ways to eliminate the load on the server from multiple searches.

                      Comment

                      Working...
                      X