Character-based Neural Embeddings for Tweet Clustering

Conference object, Preprint OPEN
Vakulenko, Svitlana; Nixon, Lyndon; Lupu, Mihai;
  • Related identifiers: doi: 10.5281/zenodo.582565
  • Subject: Computer Science - Computation and Language | Story Detection, Tweet Clustering, Tweet2vec, Vector Space Model, Character-based Embedding | Computer Science - Information Retrieval

In this paper we show how the performance of tweet clustering can be improved by leveraging character-based neural networks. The proposed approach overcomes the limitations related to the vocabulary explosion in the word-based models and allows for the seamless processi... View more
  • References (24)
    24 references, page 1 of 3

    [Arbelaitz et al.2013] Olatz Arbelaitz, Ibai Gurrutxaga, Javier Muguerza, Jesu´s M. Pe´rez, and In˜igo Perona. 2013. An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1):243-256.

    [Brigadir et al.2014] Igor Brigadir, Derek Greene, and Padraig Cunningham. 2014. Adaptive Representations for Tracking Breaking News on Twitter. In NewsKDD - Workshop on Data Science for News Publishing at The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, August 24-27, 2014, New York, NY, USA.

    [Cho et al.2014] Kyunghyun Cho, Bart van Merrienboer, C¸aglar Gu¨lc¸ehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, pages 1724-1734.

    [Dhingra et al.2016] Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl, and William W. Cohen. 2016. Tweet2vec: Character-based distributed representations for social media. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany.

    [dos Santos and Zadrozny2014] C´ıcero Nogueira dos Santos and Bianca Zadrozny. 2014. Learning character-level representations for part-of-speech tagging. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, 21- 26 June, 2014, Beijing, China, pages 1818-1826.

    [Hayashi et al.2015] Kohei Hayashi, Takanori Maehara, Masashi Toyoda, and Ken-ichi Kawarabayashi.

    [Hochreiter and Schmidhuber1997] Sepp Hochreiter and Ju¨rgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8):1735-1780.

    [Hubert and Arabie1985] Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of classification, 2(1):193-218.

    [Ifrim et al.2014] Georgiana Ifrim, Bichen Shi, and Igor Brigadir. 2014. Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering. In Symeon Papadopoulos, David Corney, and Luca Maria Aiello, editors, Proceedings of the SNOW 2014 Data Challenge co-located with 23rd International World Wide Web Conference (WWW 2014), April 8, 2014, Seoul, Korea, pages 33-40.

    [Kim et al.2016] Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Characteraware neural language models. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, pages 2741-2749.

  • Related Research Results (2)
  • Metrics
Share - Bookmark