# AnalyticBridge

A Data Science Central Community

# Analytic Stress Test: Google Beats Microsoft

We tested how Google and Bing are vulnerable to sophisticated algorithmic attacks that could potentially erase all paid ads and organic search results on search engines.

An algorithmic attack is defined by the following criteria:

• There is a goal to reverse-engineer algorithms designed by data scientists and statisticians
• The idea is to exploit flaws found in the algorithm, and leverage these flaws for financial gain or for fame

In our stress test, we have checked whether Google and Bing can detect sophisticated web traffic manipulations generated by automated (non-human), well distributed page impressions. At the core of each search engine, there is a mathematical algorithm that - given a user, ad inventory and search keyword - decides which ads should be displayed to maximize revenue for the search engine (on a keyword basis), and in which order. This is called the ad relevancy algorithm.

Typically, ads are ranked based on some variations of the formula R = max(CPC) * expected CTR, where max CPC is the maximum bid the advertiser wants to pay for the keyword in question. The expected CTR is estimated via scoring techniques (taking into account content on landing page) for ads with little history, or actual CTR when available, after eliminating bogus impressions. Whenever possible, advertiser conversions are taken into account as well (if the advertiser provides conversion data to the search engine): high CTR with poor conversion rate is considered suspicious.

Based on the above formula for R, there are 2 ways to exploit mathematical flaws:

• If you decrease the CTR of your competitors by manufacturing fake impressions, their ranking will go down and they might eventually disappear from Google, thus allowing you to lower your max CPC

We found that Google is more resilient to this type of scheme. To be successfull, the scheme must involve impressions generated across thousands of IP addresses, typically via some type of Bonets:

• Close Bonet, spread via virus technology, illegal. Can easily use 100,000 infected IP addresses.
• Open Botnet run by a ring of colluded webmasters, who agree to have a "virus" run on their machine to generate bogus impressions. With 100 webmasters, each having 100 IP addresses attached to their servers (e.g. proxy servers), it is possible to generate traffic from 10,000 IP addresses. However these IP addresses will be clustered and could trigger fraud detection rules.
• Open Botnet run by users: 10,000 users agree to download a free piece of software that is advertised to eliminate Google ads. Each user generates 1,000 manufactured impressions per day against various keywords controlled by the Botnet operator. This amounts to 300 million bogus impressions per month, enough to eliminate a bunch of ads.
• The Botnet operator also has a list of domain names (ads) that are white-listed, and these are the paid clients of the Botnet operator - these domains receive good CTR.

Who would potentially offer this type of service?

• SEM, ad agencies or SEO companies to better arbitrage search engines and secure lower max CPC for target keywords for their clients.
• Google trying to kill Bing, or the other way around.
• Hackers, unemployed data scientists (money could be offered to hackers who can successfully wipe out tons of ads).
• A new type of arbitrage company that destroy the bids on Bing (say) but not on Google to increase keyword arbitraging power (buy on Bing where it's easier to run the scheme, sell on Google).

In order to succeed, these schemes must generate a random number of impressions per IP per day, with a sound keyword distribution. This is easy to implement. Previous schemes that were using a fixed, static number of clicks per IP per day were caught due to their lack of sophistication.

The open Botnet technology is easy to implement: users or webmasters install a program on their machine that will run in the background, sending HTTP requests to a centralized machine (belonging to the Botnet operator) to download the keywords / white-listed domains to use in the scheme, and then run bogus impression with fake referral on these target keywords (the program could very easily be written in Perl with the LWP library, then turned into an executable using perl2exe).

The same technology could also be used to fool ad targeting algorithms, e.g. by artificially increasing CTR just after midnight, or increasing CTR from Indian IP addresses. The purpose is still the same: keep the best traffic for yourself, give bad traffic to your competitors, or increase arbitrage efficiency.

Views: 614

Comment