Skip to main content

Google Robot Developments

The spiders on the web have always facinated me. We know that we are visited virtually every day by all the main spiders - not just the main HTML pages but the e-commerce PHP pages as well - triggering redirects etc.

Could The New Google Spider Be Causing Issues With Websites?

Around the time Google announced "Big Daddy," there was a new
Googlebot roaming the web. Since then I've heard stories from
clients of websites and servers going down and previously
unindexed content getting indexed.

I started digging into this and you'd be surprised at what I
found out.

First, let's look at the timeline of events:

In Late September some astute spider watchers over at
Webmasterworld spotted unique Googlebot activity. In fact, it
was in this thread:
http://www.webmasterworld.com/forum3/25897-9-10.htm that the
bot was first reported on. It concerned some posters who
thought that perhaps this could be regular users masquerading
as the famous bot.

Early on it also appeared that the new bot wasn't obeying the
Robots.txt file. This is the protocol which allows or denies
crawling to parts of a website.

Speculation grew on what the new crawler was until Matt Cutts
mentioned a new Google test data center
http://www.mattcutts.com/blog/good-magazines/#comment-5293. For
those that don't know, Matt Cutts is a senior engineer with
Google and one of the few Google employees talking to us
"regular folk." This mention happened in November.

There wasn't much mention of Big Daddy until early January of
this year when Matt again blogged about it asking for feedback.
http://www.mattcutts.com/blog/bigdaddy/

Much feedback was given on the accuracy of the results. There
were also those that asked if the Mozilla Googlebot (known as
"Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)" in your visitor logs) and Big
Daddy were related, but no response was made.

Now I'm going to begin some of my own speculation:

I do in fact believe the two are related. In fact, I think this
new crawler will eventually replace the old crawlers just as Big
Daddy will replace the current data infrastructure.
http://www.textlinkbrokers.com/blogs/comments/310_0_1_0_C/

Why is this important?

Based on my observations, this crawler may be able to do so
much more than the old crawler.

For one, it emulates a newer browser. The old bot was based on
the Lynx text based browser. While I'm sure Google added
features as time went on, the basic Lynx browser is just that –
basic.

Which explains why Google couldn't deal with things like
JavaScript, CSS and Flash.

However, with the new spider, built on the Mozilla engine,
there are so many possibilities.

Just look at what your Mozilla or Firefox browser can do itself
– render CSS, read and execute JavaScript and other scripting
languages, even emulate other browsers.

But that's not all.

I've talked to a few of my clients and their sites are getting
hammered by this new spider. It has gotten so bad that some of
their servers have gone down because of the volume of traffic
from this one spider!

On the plus side, I have clients who went from a few hundred
thousand indexed pages to over 10 million in just a few weeks!
Literally since December, 2005 there's been a 3500% increase in
indexed pages over an 8 week period! Just so you know, this is
also the client's site that went down because of the huge
volume of crawling happening.

But that's still not all.

I have another client which uses IP recognition to serve
content based on a person's geographic location. If you live in
the US you get American content and pricing; if you live in the
UK you get UK content and pricing. As you may imagine, the UK,
US, Canadian and Australian content is all very similar. In
fact about the only thing noticeably different is the pricing
aspect.

This is my concern – if the duplicate content gets indexed by
Google what will they do? There's a good chance that the site
would be penalized or even banned for violation of the
webmaster quality guidelines set forth by Google here:
http://www.google.com/webmasters/guidelines.html#quality

This is why we implemented IP recognition – so that Googlebot,
which crawls from US IP addresses only sees one version of the
site.

However, a review of the server logs shows that this new
Googlebot has been visiting not only the US content but also
the content of the other sections of the site. Naturally, I
wanted to verify that the IP recognition was working. It is.
This leads me to wonder then; can this browser spoof its
location and/or use a proxy?

Imagine that – the browser is smart enough to do some of its
own testing by viewing the site from multiple IP addresses. If
that's the case then those who cloak sites are going to have
problems.

In any case, from the limited observations I've made, this new
Google – both the data center and the spider – are going to
change the way we do things.



About The Author: Rob Sullivan is a SEO Consultant and Writer
for http://www.textlinkbrokers.com. Textlinkbrokers is a link
building company. Please provide a link directly to
Textlinkbrokers when syndicating this article.


From our point of view the immediate impact is relatively modest because the main search engines have been picking up all our pages for some time. Looking at this I'm not sure whether the new robot is one of the unidentified ones or the one identified as Googlebot. Of more interest is the development of the capabilities of the new robot to look at flash and java script elements on the page and the impact that could have on optimisation. Up to now we have been able to ignore those parts of the page safe in the knowledge that they are invisible to Google and some naughty people have even been known to take advantage of this to cheat. In our case we have duplicated some links in HTML and javascript on the same page and we will soon be able to stop doing that whch is a good thing.

In the meantime the propagation of Big Daddy seems to have stalled, Rankpulse had another convulsion on Friday and our visitors are coming and going in blissful ignorance of all of this.

Popular posts from this blog

Shopping Online – Protect Yourself

A slight change of tack - I though that the following article makes a good point: Shopping Online -– Protect Yourself These days, there are great bargains to be found by shopping online. Many items that previously were only available in stores are now being bought and sold online every day. Books, cds, DVDs and electronics are all growing in popularity as online purchases. Then there are things like flights, hotel bookings, car rentals and the like that are which are well established in the online shopping world. More and more stores are putting up websites that allow you to make online orders and even supermarkets now let you do your grocery shopping online and they'’ll deliver the goods to your door. Added to this growth in stores and other big business websites, there are also millions of small traders offering you goods online too. Online auction sites such as eBay are experiencing phenomenal success. These types of purchases however carry the risk that you do not really know w

Introducing Children To Music… Strategies For Success

While we struggle to restore full menu fuctionality We thought you'd like to hear about a more uplifting topic: Introducing Children To Music - Strategies For Success I've heard a million parents lament the fact that they didn'’t get their children interested in music sooner. There are also hundreds of adults out there that wish they had learned how to play an instrument when they were younger. Studies actually support the idea that music stimulates certain brain connections and can actually help children grow smarter! Music also provides an invaluable outlet for safe expression of feelings and emotions, and can also serve as an important learning tool throughout your children's lives! Music helps educate in many ways, by developing children'’s memory skills and nourishing their spirit. Now, some children are a bit resistant to music at first, but you can easily find ways to encourage them to enjoy music in many different forms early in life. You need to simply adop

Visitor Volumes down - not sure why

Yes the puzzle continues. We know that there has been a significant Google update but the impact on our SERPs has generally not been too significant - the usual crop of ups and downs spiced up by the odd disappearance. But the last couple of days have been 25% down on the previous working day run rate. Remember - don't panic over a couple of day's figures! So we are going to have a long weekend - back next Wednesday (for our small faithful band of regular readers). This trip will be a test of some technology we are planning to take on a much longer excursion later in the year - so we may improve on that if we find Wifi hotspots in North Wales. If you are one of that faithful band it would be fascinating to hear from you. This daily exercise in one way communication gives focus to my morning review of yesterday's data but the complete absence of response is a bit weird. I have to say it is matched by the lack of response on the SEO board where I have zero credibility and