Subscribe to DSC Newsletter

Here's my answer. Feel free to add yours:

  1. Identification of spam, low value content, and users with multiple accounts. Need to segment and score users.
  2. Statistical bias. People posting on social media are not the same as those inactive in social media.
  3. Need to stem and normalize text, automatically correct typos, identify and categorize text atoms and relationships between text atoms, and deal with foreign language. 
  4. Need to blend data gathered with internal data. Need to perform fuzzy merging (maybe not at a very granular level) between internal corporate data and data obtained on social networks.
  5. Potential privacy or liability issues, e.g. if data gathered is used to target people individually, through marketing campaigns, fraud investigations or to penalize users (e.g. refusing a job to a candidate based on data mining of user posts on social networks).
  6. Getting actionable, ROI-generating insights from the analyses. In case of fraud detection or better targeting users, the lift should be easy to measure.

Views: 2191

Replies to This Discussion

6. Sparcity of data for time- AND location-based social media.


Add two more:

Difficulty in developing general taxonomies. Too much diversity in subject matter which is always changing.

Challenges for natural language processing (NLP). Difficult to categorize content based upon short responses, and/or content which is not from the original source.

-Ralph Winters


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service