Subscribe to DSC Newsletter

How to Transform your Google Spreadsheet Into an Opinion Mining Tool

This blog was originally featured on blog.aylien.com, a Text Analysis blog with tutorials, Data Visualisations and industry discussions.

Introduction

Our founder, Parsa Ghaffari, gave a talk recently on Natural Language Processing and Sentiment Analysis at the Science Gallery in Dublin. As part of the talk, he put together a nice little example of how you can transform your Google Spreadsheet into a powerful Text Analysis and Data Mining tool.

In this case, he took a simple example of analyzing restaurant reviews from a popular review site but the same could be done for hotels, products, service offerings and so on.

He wanted to show how easy it can be for data geeks and even the less technical marketers among us, to start analyzing text and gathering business insight from the reams of textual data online today.

So what are we going to do?

All within a Google Spreadsheet we’re going to show you how to:

  • Extract reviews for restaurants from TripAdvisor
  • Analyze the sentiment of these reviews
  • Extract Keywords (positive and negative)
  • Correlate the Sentiment with the onsite TripAdvisor Ratings 

Step 1. Extract the Reviews:

Create a new Google Sheet and add the URL’s of the web pages that house the reviews you want to analyze to it.

Open a new sheet within the same document. This is where we’ll extract the reviews from the web page and get them ready for analysis.

There is a really simple formula you can use to extract the reviews automatically:

=IMPORTXML(cell_containing_tripadvisor_link, "//div[@class='below_the_fold']//p[@class='partial_entry']/text()")

The =IMPORTXML() formula takes a link and an XPath query and extracts pieces of information from the link. Can think of the XPath query as an address to where the content lies.

Using this formula you can add all of the reviews extracted from the web page to your target column in your analysis sheet.

We’ve also used the same function as above, to extract the TripAdvisor Ratings from the Web pages. These are star ratings out of 5 that are entered by the reviewer.


=
IMPORTXML(cell_containing_tripadvisor_link, "//div[@id='REVIEWS']//img[contains(@class,'rating_s_fill')]/@alt")

Step 2. Analyze the Sentiment and extract keywords:

To start analyzing whether the reviews are positive or negative, you need to first start up yourText Analysis addon. If it’s your first time using it, check out our tutorial here.

Once you have the addon started, you can start analyzing the Sentiment of the reviews in the first column. To do this we’re going to use an AYLIEN Text Analysis specific function:

=SentimentPolarity(cell_containing_review)

This will analyze the text in the target cell you choose and display the results, whether it’s positive, negative or neutral. To run the same analysis on all the reviews, just simply drag the formula down the column as far as needs be.

With the polarity results, using a pivot table, we created a basic visualization, shown below, which gives a high-level insight into what percentage of customer reviews are positive or negative.

Step 3. Keywords:

We use another function from the Text Analysis addon similar to the Sentiment one to extract keywords from the text of the reviews.

=Keywords(cell_containing_review)

This will extract and display the main keywords from the text of each review. Like the last function drag it down through the column to run the analysis on all the reviews.

The second thing we’re going to do with keywords, which is optional, is to attempt to understand what terms or words, are used most often in positive or negative reviews. This should help us on our way to understanding what the good and bad aspects are for each review. Is it the food, the service or something else, for example.

In order to extract keywords with the highest association to each sentiment (positive or negative), we have concatenated all the extracted keywords for each graph, separated individual keywords using a mix of =JOIN() and SPLIT() functions.

To display the results we then created a pivot table to find out how frequently a particular keyword has appeared in positive or negative reviews. 

Step 4. Correlation:

We have given numerical values to both the sentiment score (Negative -1, Positive 1) and the star ratings (1-5) to make it possible to see if there is any correlation between the two.

To do this, create a new sheet called Correlation and pull in the values from columns D and F and lay them out as shown below.

To run a correlation analysis, choose the cells containing all of your data and in a new cell use the Correlation function below to get the Correlation Coefficient:

=CORREL(A2:A20, B2:B20)

A positive correlation coefficient shows that the two series are moderately correlated, which is also evident when the two series are plotted.

That’s just a couple of examples of how you can transform you Google Spreadsheet into a powerful Data Mining and Text Analysis tool, to extract insights that could be invaluable to a business owner or a brand.

You can sign up to use the AYLIEN Text API at http://developer.aylien.com/signup ;

Views: 5574

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service