Search Engine Bot Detection

Nes-Emulator.com

This page is made pecificly for detecting when a one of the Serach Enngins and in particular THE GoogleBot visits my website. The problem is that I have not seen the GoogleBot visiting me for quite a wise and now this script will insure that when Google does come to visit and INDEX my site I wil be notified abut that. (An e-mail will be sent to me wne one of the bots from the list enters this page)
If you would like to use this PHP script iInsert it in a .php page
Add it to any PHP page you like and it could be added anywhere you want. Good idea would be to put it on the index page as search spider is sure to hit that, but if you are like me and your index page is pure HTML than just make a seperate PHP page that is linked from the index. Search engines will spider it eventualy and so you will b notified about their visit.

Detects and reports if it is:

- Googlebot Deep Crawl
- Google Freshbot
- Mediapartners (AdSense googlebot detection will be added later)


Code:


bot.php (change the $to var with your email address)
Code / Sample:


<?php

    $botlist = array(   
                "Teoma",                   
                "alexa",
                "froogle",
                "inktomi",
                "looksmart",
                "URL_Spider_SQL",
                "Firefly",
                "NationalDirectory",
                "Ask Jeeves",
                "TECNOSEEK",
                "InfoSeek",
                "WebFindBot",
                "girafabot",
                "crawler",
                "www.galaxy.com",
                "Googlebot",
                "Scooter",
                "Slurp",
                "appie",
                "FAST",
                "WebBug",
                "Spade",
                "ZyBorg",
                "rabaz");


    foreach($botlist as $bot) {

      if(ereg($bot, $HTTP_USER_AGENT)) {

          if($bot == "Googlebot") {
            if (substr($REMOTE_HOST, 0, 11) == "216.239.46.") $bot = "Googlebot Deep Crawl";
            elseif (substr($REMOTE_HOST, 0,7) == "64.68.8") $bot = "Google Freshbot";
          }
          if ($QUERY_STRING != "") {
            $url = "http://" . $SERVER_NAME . $PHP_SELF . "?" . $QUERY_STRING . "";
          } else {
            $url = "http://" . $SERVER_NAME . $PHP_SELF . "";
          }

// settings
$to = "[email protected]";
$subject = "Detected: $bot on $url";
$body = "$bot was deteched on $url\n\n
Date.............: " . date("F j, Y, g:i a") . "
Page.............: " . $url . "
Robot Name.......: " . $HTTP_USER_AGENT . "
Robot Address....: " . $REMOTE_ADDR . "
Robot Host.......: " . $REMOTE_HOST . "
";

mail($to, $subject, $body);

      }

    }

?>


To test if this is working you could add "Mozilla", in the botlist. As you can guess that means that you will get notified when a regular no-bot visitor enters the page. Use this to test if you have everything setup correctly and sendmail on your server is working.

Here I added AdSense GoogleAds to the page to test if the Mediapartners bot will be detected and look what I got in mail:

Mediapartners was deteched on http://www.nes-emulator.com/x_bot.php

Date.............: January 28, 2003, 8:43 am
Page.............: http://www.nes-emulator.com/x_bot.php
Robot Name.......: Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)
Robot Address....: 64.68.87.69
Robot Host.......:


Kind of cool isn't it!
Now you can go back to indexing Download Free NES Emulator page my Search Engine friends ;)