I would share with you some early results about a research I'm doing in the field of "graph entropy" applied to text mining problem.

Why Graph Entropy is so important?
Based on the main concept of entropy the following assumptions are true:

  • The entropy of a graph should be a functional of the stability of the structure (so that it depicts in some way the distribution of the edges of the graph).
  • Sub sets of vertexes quite isolated from the rest of the graph are characterized by a high stability (low entropy).
  • It's quite easy use the entropy as a measure for graph clustering.
As you can imagine a smart definition of graph entropy can be helpful in many problems related to text mining.
Let's see an application of graph entropy to extract relevant words in a document.
The experiment as been done using the first section of the definition of "nuclear weapons".
Graph Entropy:
  • The method based on graph entropy seems provide the more accurate results (5 errors respect 9 and 11 of the other methods).
  • The graph entropy depicts better the core of the graph containing the relevant words.
  • I tried to expand the number of relevant features and the accuracy of the other two methods tends to worsen quickly:

