Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ YÖK Açık Bilim - CoH...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
versions View all 2 versions
addClaim

Local context based linear text segmentation

Authors: Erdem, Hayrettin;

Local context based linear text segmentation

Abstract

Metin dokümanlarında konusal yapının anlaşılması, etkili erişim ve arama, otomatik özetleme ve dokümanları konuları hakkında tanımlamak, biraraya getirmek ve takip etmek gibi görevler için önemlidir. Dokümanlar genellikle içerdiği bölümleri birbirinden ayıran başlıklar ve yapısal ayıraçlar içeriyor olsalar da, bazı dokümanlar bu özelliklere sahip değildir ve bu durum konu bakımından metin bölümleme sistemlerine olan ihtiyacı ortaya çıkarmaktadır. Konuşma verisinden elde edilen transkript metinler ve gazete, blog yazıları gibi konusal bakımdan yapısı belirsiz olan metinler, bu tür dokümanlara örnek olarak gösterilebilir. Metinlerde konu bölümlendirme için, yani metni kendi içerisinde tutarlı konusal bölümlere ayırmada, yerel içerik tabanlı ve kelimeler arasındaki ilişkilerden yararlanan yeni bir yöntem sunulmaktadır. Kelimeler arasındaki anlam bütünlüğünü ifade etmede, önerilen yöntem HAL anlamsal uzayından yararlanmaktadır. Bu uzay, metin içerisinde birlikte gözüken kelimelerin incelenip sabit uzunluktaki bir pencerenin metin boyunca kaydırılmasıyla oluşturulur. Önerilen algoritma olan BTS, konusal değişiklikleri döngüsel olarak tespit etmektedir. Her döngüde, cümlelerden oluşan bir blok ele alınarak, birbiriyle en ilişkili cümle ikilileri bulunur ve bu çiftlerin incelenmesiyle yeni bir bölüm oluşturulur. Önerilen yöntem, hata içermeyen haber bülteni transkriptlerinde ve yapay olarak farklı bölümlerin biraraya getirildiği dokümanlar üzerinde değerlendirilmektedir. Türkçe dili için, otomatik olarak haber metinlerinin kullanılmasıyla yapay bir veri seti oluşturulmuştur. Performans karşılaştırması için, TextTiling ve C99 yöntemleri kullanılmaktadır ve sonuçlar, önerilen yöntemin bu yöntemlerle karşılaştırılabilir olduğunu göstermektedir. Sonuçlar ayrıca, ANOVA ve Tukey testleri ile istatistiksel olarak doğrulanmaktadır.Anahtar sozcukler: Metin Bölümlendirme, Konu Bölümlendirme, Doğal Dilİşleme, Kelime bütünlüğü, Anlamsal ilişki

Understanding the topical structure of text documents is important for effective retrieval and browsing, automatic summarization, and tasks related to identifying, clustering and tracking documents about their topics. Despite documents often display structural organization and contain explicit section markers, some lack of such properties thereby revealing the need for topical text segmentation systems. Examples of such documents are speech transcripts and inherently unstructured texts like newspaper columns and blog entries discussing several subjects in a discourse. A novel local-context based approach depending on lexical cohesion is presented for linear text segmentation, which is the task of dividing text into a linear sequence of coherent segments. As the lexical cohesion indicator, the proposed technique exploits relationships among terms induced from semantic space called HAL (Hyperspace Analogue to Language), which is built upon by examining the co-occurrence of terms through passing a fixed-sized window over text. The proposed algorithm (BTS) iteratively discovers topical shifts by examining the most relevant sentence pairs in a block of sentences considered at each iteration. The technique is evaluated on both error-free speech transcripts of news broadcasts and documents formed by concatenating different topical regions of text. A new corpus for Turkish is automatically built where each document is formed by concatenating different news articles. For performance comparison, two state-of-the-art methods, TextTiling and C99, are leveraged and the results show that the proposed approach has comparable performance with these two techniques. The results are also statistically validated by applying the ANOVA and Tukey post-hoc test.Keywords: Text Segmentation, Topic Segmentation, Natural Language Processing, Lexical Cohesion, Semantic Relatedness.

75

Country
Turkey
Related Organizations
Keywords

Database Aesthetics, Remix, Hypernarrative, Wunderkammer, QA76.9.T48 E73 2014, Text processing (Computer science), Vaporwave, Computer Engineering and Computer Science and Control, Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green