Visualising text co-occurrence networks

Article English OPEN
Hirsch, Laurie ; Andrews, Simon (2016)
  • Publisher: Tilburg University

We present a tool for automatically generating a visual summary of unstructured text data retrieved from documents, web sites or social media feeds. Unlike tools such as word clouds, we are able to visualise structures and topic relationships occurring in a document. These relationships are determined by a unique approach to co-occurrence analysis. The algorithm applies a decaying function to the distance between word pairs found in the original text such that words regularly occurring close to each other score highly, but even words occurring some distance apart will make a small contribution to the overall co-occurrence score. This is in contrast to other algorithms which simply count adjacent words or use a sliding window of fixed size. We show, with examples, how the network generated can be presented in tree or graph format. The tree format allows for the user to interact with the visualisation and expand or contract the data to a preferred level of detail. The tool is available as a web application and can be viewed using any modern web browser
  • References (8)

    1. Smith, A., Chuang, J., Hu, Y., Boyd-Graber, J. and Findlater, L., 2014. Concurrent Visualization of Relationships between Words and Topics in Topic Models. Sponsor: Idibon, 79.

    2. Aga, R.T. and Wartena, C., 2015, October. Constructing concept clouds from company websites. In Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business (p. 38). ACM

    3. Sowa, J.F., 2011. Cognitive architectures for conceptual structures. In Conceptual Structures for Discovering Knowledge (pp. 35-49). Springer Berlin Heidelberg.

    4. Viégas, F.B., Wattenberg, M., Tag Clouds and the Case for Vernacular Visualization, ACM Interactions, XV.4 - July/August, 2008

    5. Gambette, P. and Véronis, J., 2010. Visualising a text with a tree cloud. In Classification as a Tool for Research (pp. 561-569). Springer Berlin Heidelberg..

    6. Bostock, M., 2014. Data-Driven Documents-D3. js.

    7. Hirsch, L. and Tian, D., 2013, January. Txt2vz: a new tool for generating graph clouds. In International Conference on Conceptual Structures (pp. 322-331). Springer Berlin Heidelberg.

    8. Reingold, E.M. and Tilford, J.S., 1981. Tidier drawings of trees. Software Engineering, IEEE Transactions on, (2), pp.223-228.

  • Metrics
    No metrics available
Share - Bookmark