publication . Article . 2004

SeLeCT: A lexical cohesion based news story segmentation system

Stokes, N.; Carthy, J.; Alan Smeaton;
Open Access
  • Published: 01 Jan 2004
  • Country: Ireland
In this paper we compare the performance of three distinct approaches to lexical cohesion based text segmentation. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e., distinct news stories from broadcast news programmes. Our approach to news story segmentation (the SeLeCT system) is based on an analysis of lexical cohesive strength between textual units using a linguistic technique called lexical chaining. We evaluate the relative performance of SeLeCT with respect to two other cohesion based segmenters: TextT...
ACM Computing Classification System: InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
free text keywords: Artificial intelligence, Digital video, Algorithms
25 references, page 1 of 2

[1] J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In the Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998.

[2] R. Barzilay and M. Elhadad. Using lexical chains for text summarization. In the Proceedings of the Intelligent Scalable Text Summarization Workshop, 1997. [OpenAIRE]

[3] D. Beeferman, A. Berger, and J. Lafferty. Statistical models for text segmentation. Machine Learning, 34(1- 3):177-210, 1999.

[4] F. Choi. Advances in domain independent linear text segmentation. In the Proceedings of the North American Chapter of the ACL, 2000.

[5] S. Dharanipragada, M. Franz, J.S. McCarley, S. Roukos, and T. Ward. Story segmentation and topic detection. In the Proceedings of the DARPA Broadcast News Workshop, 1999.

[6] M.A.K. Halliday. Spoken and Written Language. Oxford University Press, 1985.

[7] M.A.K. Halliday and R. Hasan. Cohesion in English. Longman, 1976.

[8] M. Hearst. Texttiling: Segmenting text into multiparagraph subtopic passages. Computational Linguistics, 23(1):33-64, 1997.

[9] J.S. Justeson and S.M. Katz. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, (11):9- 27, 1995.

[10] H. Kozima. Text segmentation based on similarity between words. In the Proceedings of the Association for Computational Linguistics, pages 286-288, 1993.

[11] Christopher D. Manning. Rethinking text segmentation models: An information extraction case study. 1998.

[12] G.A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Five papers on wordnet. Technical report, Cognitive Science Laboratory, 1990.

[13] Kan Min-Yen, J. L. Klavans, and K. R. McKeown. Linear segmentation and segment relevance. In the Proceedings of the International Workshop of Very Large Corpora, pages 197-205, 1999.

[14] J. Morris and G. Hirst. Lexical cohesion by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1), 1991.

[15] M. Okumura and T. Honda. Word sense disambiguation and text segmentation based on lexical cohesion. In the Proceedings of the Conference on Computational Linguistics, pages 755-761, 1994.

25 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue