Subscribe to DSC Newsletter

Social Media usage is becoming more and more popular. This information -which was never available before- can be analyzed to potentially extract several insights about users such as their beliefs, values, emotions, opinions and online behavior.

How could such insights be used by Marketeers, Social Researchers or even Psychologists? What are the pitfalls?

For some examples of Social Media Analytics see here :

Views: 131

Reply to This

Replies to This Discussion

Thomas: I think that you have outlined what the potential gains are. The pitfalls would be over analyzing the data and assuming that the beliefs and online behavior is reflective of the participant.

-Ralph Winters

From my experience so far there are many pitfalls : Sampling issues and spam accounts are some of them . However : Clear-cut beliefs and online behavior can be easily extracted because there is an immense amount of available users that you can capture. If i would tell you that the most frequent thought of what people don't want on Twitter is to "not go to work (or school, or University) tomorrow" wouldn't that appear to you as being reflective for what people think in general at some point in their lives?

IMHO, specific beliefs and behavior can be captured which are reflective for a significant number of participants but of course not for each and every person. Those insights enable us to form some hypotheses but -of course- not to draw any conclusions about cause and effect.


Themos: I did enjoy looking at your blog, and I think you are doing some good work. Some more observations. There would be two sampling problems: Getting a representative sample from Twitter itself, which I know is difficult, and sampling from the non-twitter population. I would be interested to know first and foremost who the Twitter population is before concluding anything about them. Be careful not to extrapolate results derived from Twitter onto specific populations. Deriving sentiment is tough enough when interpreting answers to a multiple choice questionnaire; it is much harder to do when mining text, since you are extracting sentiment.

I agree that purposes would be to associate and hypothesize, and not necessary draw any conclusions. However, I still do not think there is enough (good) data yet. I look at healthcare claims data, and it often will take 5 years or more of historical data to be able to say something about a normal probability occurance.

-Ralph Winters

Getting data only from Twitter will leave you with a biased sample. This is by all means undeniable. Let's see an example : You get data of 100K Twitter users stating in their bio that they follow politician X and another sample of 100K Twitter users stating in their bio that they follow politician Y.

By sampling 200K Twitter users and then running chi-square tests on word co-occurrences you could make some hypotheses on the differences of attitude or beliefs of followers of X politician vs followers of Y politician (but keeping also in mind that we are analyzing a specific population segment - namely "Twitter users" ). To clarify : Such an analysis could only help us in forming some hypotheses for the population... but only that.

One other potential pitfall is the bias of each "sentiment extraction" software to better capture specific text patterns: For example my software (or methodology) could be better (=more accurate) in extracting the sentiment of specific thoughts of users, while some other software could be better in extracting other segments of the search space. This bias can only be found if we compare the results of each "sentiment extraction engine" opposed to the ultimate engine : human interpretation.

Other types of analysis which compares groups within Twitter could prove very useful. One potentially interesting application that i am currently looking at is the prediction of Viral Tweets, in other words predicting which Tweets have an increased probability of being liked -and thus spread out- among Twitter users. So far results are very promising but a great deal of testing and pre-processing is required.


On Data Science Central

© 2020 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service