A Data Science Central Community
We tested how Google and Bing are vulnerable to sophisticated algorithmic attacks that could potentially erase all paid ads and organic search results on search engines.
An algorithmic attack is defined by the following criteria:
In our stress test, we have checked whether Google and Bing can detect sophisticated web traffic manipulations generated by automated (non-human), well distributed page impressions. At the core of each search engine, there is a mathematical algorithm that - given a user, ad inventory and search keyword - decides which ads should be displayed to maximize revenue for the search engine (on a keyword basis), and in which order. This is called the ad relevancy algorithm.
Typically, ads are ranked based on some variations of the formula R = max(CPC) * expected CTR, where max CPC is the maximum bid the advertiser wants to pay for the keyword in question. The expected CTR is estimated via scoring techniques (taking into account content on landing page) for ads with little history, or actual CTR when available, after eliminating bogus impressions. Whenever possible, advertiser conversions are taken into account as well (if the advertiser provides conversion data to the search engine): high CTR with poor conversion rate is considered suspicious.
Based on the above formula for R, there are 2 ways to exploit mathematical flaws:
We found that Google is more resilient to this type of scheme. To be successfull, the scheme must involve impressions generated across thousands of IP addresses, typically via some type of Bonets:
Who would potentially offer this type of service?
In order to succeed, these schemes must generate a random number of impressions per IP per day, with a sound keyword distribution. This is easy to implement. Previous schemes that were using a fixed, static number of clicks per IP per day were caught due to their lack of sophistication.
The open Botnet technology is easy to implement: users or webmasters install a program on their machine that will run in the background, sending HTTP requests to a centralized machine (belonging to the Botnet operator) to download the keywords / white-listed domains to use in the scheme, and then run bogus impression with fake referral on these target keywords (the program could very easily be written in Perl with the LWP library, then turned into an executable using perl2exe).
The same technology could also be used to fool ad targeting algorithms, e.g. by artificially increasing CTR just after midnight, or increasing CTR from Indian IP addresses. The purpose is still the same: keep the best traffic for yourself, give bad traffic to your competitors, or increase arbitrage efficiency.