Subscribe to DSC Newsletter

Online advertising: a solution to optimize ad relevancy

When you see google ads on Google search result pages or elsewhere, the ads that are displayed in front of you eyes (should) have been highly selected in order to maximize the chance that you convert and generate ad revenue for Google. Same on Facebook, Yahoo, Bing, LinkedIn and on all ad networks.

If you think that you see irrelevant ads, either they are priced very cheaply, or Google's ad relevancy algorithm is not working well.

Ad scoring algorithms used to be very simple, the score being a function of the max bid paid by the advertiser, and the conversion rate (referred to as CTR). This led to abuses: an advertiser could generate bogus impressions to dilute competitor CTR, or clicks on its own ads to boost its own CTR, or a combination of both, typically using proxies or botnets to hide its scheme, and thus gaining unfair competitive advantage on Google.

Recently, in addition to CTR and max bid, ad networks have added ad relevancy in their ad scoring mix (that is, in the algorithm used to determine which ads you will see, and in which order). In short, ad networks don't want to display ads that will make the user frustrated - it's all about improving user experience and reducing churn to boost long term profits.

How does ad relevancy scoring work?

Here's our solution. There are three components in the mix: 

  • The user visiting a web page hosting the ad in question
  • The web page where the ad is hosted
  • The ad itself, that is, its text or visual content, or other metrics such as size etc.
  • The fourth important component - the landing page - is not considered in this short discussion (good publishers scrape advertiser landing pages to check the match between a text ad and its landing page, and eliminate bad adware, but that's the subject for another article)

The solution is as follows.

First create three taxonomies:

  • Taxonomy A to categorize returning users based on their interests, member profile, or web searching history
  • Taxonomy B to categorize web pages that are hosting ads, based on content or publisher-provided keyword tags
  • Taxonomy C to categorize ads, based on text ad content, or advertiser provided keyword to describe the ad (such as bid keyword in PPC campaigns, or ad title)

The two important taxonomies are B and C, unless the ad is displayed on a very generic web page, in which case A is more important than B. So let's ignore taxonomy A for now. The goal is to match a category from Taxonomy B with one from Taxonomy C. Taxonomies might or might not have the same categories, so in general it will be a fuzzy match, where for instance, the page hosting the ad is attached to categories Finance / Stock Market in Taxonomy B, while the ad is attached to categories Investing / Finance in Taxonomy C. So you need to have a system in place, to measure distances between categories belonging to two different taxonomies.

How do I build a taxonomy?

There are a lot of vendors and open source solutions available on the market, but if you really want to build your own taxonomies from scratch, here's one way to do it:

  • Scrape the web (DMOZ directory with millions of pre-categorized webpages, that you can download freely, is a good starting point), extract pieces of text that are found on a same web page, and create a distance to measure proximity between pieces of text
  • Clean, stem your keyword data
  • Leverage your search query log data: two queries found in a same session are closer to each other (with respect to the above distance) than arbitrary queries
  • Create a table with all pairs of "pieces of text" that you have extracted and that are associated (e.g. found on a same web page or same user session). You will be OK if your table has 100 million such pairs.

Let's say that (X, Y) is such a pair. Compute n1 = # of occurences of X in your table; n2 = # of occurrences of Y in your table, and n12 = # of occurences where X and Y are associated (e.g. found on a same web page). A metric that tells you how close X and Y are to each other would be R = n12 / SQRT(n1 * n2). With this dissimilarity metric (used e.g. at http://www.frenchlane.com/kw8.html) you can cluster keywords via hierarchical clustering and eventually build a taxonomy - which is nothing else than an unsupervised clustering of the keyword universe, with labels manually assigned to (say) top 20 clusters - each representing a category.

Views: 4059

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Sean Flanigan on June 17, 2012 at 2:52pm

Interesting approach when user profile information is not factored in.

Comment by Capri on October 11, 2011 at 6:57pm

This is a good solution for companies such as LinkedIn or Facebook, where traffic is not driven by search. For search engines, you can match a user query to a bid keyword, using an algorithm that extracts potential bid keywords out of user queries (these algorithms are performing very poorly in my opinion).

Companies such as Facebook actually found the following solution to improve ad relevancy: just allow the user to flag an ad as useful or annoying or irrelevant. By blending this user feedback with some mathematical ad relevancy system, Facebook could achieve better results, in terms of ROI / advertiser churn.

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service