Click rank distribution analysis



First out in our series of data analysis from the AOL data breach is click rank. In other words, what positions are popular to click after a search has been conducted? We looked at the data in our database with a standard SQL query,

SELECT itemrank, COUNT(id) AS clicks FROM aol GROUP BY itemrank WITH ROLLUP

This data was outputted to a textfile and later parsed with a perl script to generate percentage:

perl -e '$max = 36389569; while(<>) { @arr = split(/\s+/,$_); print "$arr[0] $arr[1] " . ($arr[1]/$max)*100 . "%\n"; }' < data_in.txt >clickthru_data.txt

As you can see for yourself some really interesting information can be found in the first 12 rows

Click distribution analysis of AOL search data - hold mouse over header to get an explanation of the column.

Rank Clicks Percentage Comment
01695503046.5931047438347%These are all searches made that didn't generate a click. Read below for further discussion on this figure.
1822027722.5896519961531%Position #1 naturally has the most hits.
59436672.59323489102056%Everything above position #5 (usually over the fold in 1024 resolution) has a healthy boost to it's percentage.
95491961.5092127087298%Notice how this row has less percentage than #10.
105773251.58651233269622%This is the last position on the first page of search results.
111276880.350891762416862%As you can see, #11 doesnt even get 1% of the possible hits from any given search. This means, being on the beginning of page 2 would give a very powerful boost if you could increase your ranking just a few steps.


As a webmaster I find the first row very interesting, are 46% of all searches not giving any hits to websites from the search engine? The data ofcourse include the 7,887,022 requests for "next page" included in the database (see README). However this is as I see it one of the largest pools of possible sources of traffic available right now. How about using that SEO-marketing-money on a proper copywriter instead? It might turn out the best deal ever.


The first 10 "real" results are no surprise to anyone being involved in the web today, we know that everything above the fold is what gets clicked. We also know that it's the data on the first page that really counts. Going just from position #4 to #11 is a tenfold decrease in traffic. Top five is where everyone wants to be.


Further findings include the odd anomalty in distribution of clicks, for instance rank 355, 385, 415 which is a curiosity since they are result #4 on each new page. Result #500 which is the last available result on AOL search also has a extra number of hits. I guess that can be written of as pure human curiosity.



Being on the verge of the next cluster of results can impose a good incitement to try and increase your ranking. If you already rank on the first page and dont want to spend time increasing your ranking to top 5 you can rest assured you're not missing that many visitors anyway.


AOL Stalker is a service letting the world search the database released from AOL containing over 36 million searches made by it's own users. A digital goldmine for SEOs, webmasters and affiliate marketers. Not to mention the hours of fun you'll have stalking! AOL Stalker will publish new detailed analysis of the data retrieved from AOL in a timely manner on the statistics page. Please stay tuned for more and dont forget to digg this article!



This searchable database contains 36389569 searches made by AOL users from 01 March, 2006 - 31 May, 2006. The data was released under a non-commercial research license by AOL. The associated README file contains further information about what data was provided. If you find any data that actually makes it possible to identify a user, please let us know using the [!]-function and we'll remove those references.