Monday, October 20, 2008

Google’s Geo-targeted Results and Ranking Analysis

As most of us know, Google is fond of tailoring its results for the target audience and returning results applicable to the user’s location and various attributes, where as Live and Yahoo are slowly following suit. This poses a problem when collating a ranking analysis or trend for search engine optimisation (Search Engine Optimization). Let’s take a closer look at Google for example, China has a huge censorship issue of returned results that measures 2MB as a text document, however these results are left in for Europe as is many other countries. This has been shown by recent studies by both Jonathan Zittrain and Benjamin Edelman and pose an issue when performing analysis as the analysis itself is irrelevant unless you take into account geo-targeting.

Additional complication can be added by the fact that you can set preferences on what you want Google to return to you on a search including Google Base results for products, Maps for locations and local business searches writing any sort of automation is almost impossible. Google’s AUP and T&C’s state that any crawler or automated system is in violation of there acceptable use policy and should not be done, in this case and in tune with Google’s guidelines lets hypothesise on what obstacles you could encounter and ways in which to overcome these problems.

One way is proxied traffic for lookups to force geo-targeting to work for you and your query. It is very easy to write automation to manipulate network settings in a Windows environment forcing lookups via a proxy server in say, America. By then performing simple server side queries and utilizing the returned response from the search engine query you can obtain the listings. It’s simply a case then of iterating through the returned code to obtain the results. The easiest way for this is to load the returned query as a model or an object that can be analysed. Older software used delimiters to carve the code into an array/list of data and looked at each piece to find the given URL of a site. This however, is not accurate and should a site change its layout etc this method is rendered useless costing man-hours regular updates and delimiter checks.

So why does Google and others tailor result data and is it accurate?

There are many reasons for this but the most of all being to improve a users experience and to make the results more accurate for that individual. However even this cannot be treated as the law. Recent investigation by colleagues and me even shows that queries to Google at the same location and broadcast address, results can differ from machine to machine. “Settings and preferences” I hear you say, these machines are configured exactly the same so that is not a factor…

In Google’s case this is caused by Google’s load balancing on their network and which node you happen to hit that returns a result of this balancing act. Not every Google server does a Database lookup every time on every query, this would require a super powered database platform, these servers on the cluster cache the results to increase speed, each server holds its own temporary index and cache. Using this theory each server can return a different result, although these are often very similar, but not exact. To explain this in English here is a quick list of possible addresses at Google’s datacentre; although they do not broadcast these addresses, a simple network discovery yields these results

IP Ranges consist of 44 ranges and up to 20 broadcast IP addresses on each, this equates to 880 possible IP addresses or put another way a whole lot of possible servers even if you take into account non-server IP allocation. Now no one knows how many nodes sit behind these addresses other than Google themselves, but the possibilities are huge, want to test it for yourself take a look at http://www.seocritique.com/datacentertool/ and query for yourself. This returns a whole lot of data, all geo-targeted and all different on each lookup and server truly mind-boggling. The only way to calculate positioning is to take an average across the array, of course unless you target a node for your particular query region. Again, network discovery and IP Assignment lookups can help with this to establish the region of the range/IP or device.

So we know the network for Google, its impossible to query everything manually.

Again hypothetically automated according to Google’s anti flood would appear to be queries that hit the address with a higher frequency than 1 per 2 seconds per node. Google’s API will allow nowhere near the queries and results you need to obtain the Top 100 positioning of a domain on a phrase. So doing a query, obtaining the results by simulating a manual search, as long as it’s within these limits, will do the same job as a manual search. As there are so many nodes, you can query them all at the same time and stagger your queries per node to stay within these limits. Once data is obtained, you can analyse this to obtain an average.

What about other search engines, Live, Yahoo?

Yes the other search engines like Live and Yahoo don’t get as much usage as the giant that is Google but its worth knowing how you rank on these due to Microsoft’s ongoing development and integration with its operating systems. By default Live is the search engine of choice for Vista and other MS Operating systems and with IM clients such as Live Messenger and Yahoo Messenger both using their counterpart search engines usage is on the increase of these underdogs. Live uses a similar type of result model so the same logical process as above is still applicable but Yahoo is a completely new spin on this methodology. All Yahoo results are stepped via a Yahoo address making the analysis of the returned request useless. However, the good people of Yahoo offer a very good API to allow you to obtain the data. With their openness and willing to aid the developer development of tools and SERP’s for Yahoo is quick and easy. If only others would take the example on board.

Google’s future

Google is always changing and developing new technologies, the latest being Chrome. This is basically Google’s answer to internet explorer, breaking into the browser market or simply broadening horizons you make the choice, but one thing is for sure if Microsoft make Live the default search engine on Internet Explorer you can bet Google is the default on Chrome….

Final thought….

Of course, all this is hypothetical but it offers answers to some of the questions by SEO (Search Engine Optimisation) professionals on how to illustrate the result/benefit of SEO and organic targeted marketing. Also, it opens new and exciting question about the industry, its direction and how it continues to evolve…

Labels: , ,



0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home