Subscribe to DSC Newsletter

Extract meta concepts through co-occurrences analysis and graph theory

....
So what I did is the following (be aware that is not the formal implementation of LSA!):
  1. Filter and take the base form of the words as usual.
  2. Build the multidimensional sparse matrix of the co-occurrences;
  3. I calculated for each instance the frequency to find it in the corpus;
  4. I calculated for each instance the frequency to find it in the doc;
  5. I weighted such TF-IDF considering also the distance among the co-occurrences.

In this way we are able to rank all co-occurrences and set a threshold to discard items having low rank.
In the last step I built a graph where I linked the co-occurrences.
As you can see in the following examples, the graphs are initially pretty complex, and to refine the results, I applied filter based on the number of connected components in the graph.
to read the entire post, visit my blog at:
results before filtering:
Results after filtering:

Views: 408

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service