publication . Conference object . Part of book or chapter of book . Other literature type . 2018

Revealing Historical Events out of Web Archives

Lobbé, Quentin;
Open Access English
  • Published: 10 Sep 2018
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; As the living Web expands, worldwide volumes of Web archives constantly increase, making difficult to identify relevant archived contents. Here we propose an application for detecting historical events out of a corpus of Web archives and based on an entity called Web Fragment: a semantic and syntactic subset of a given Web page. The Web fragment has the particularity to be indexed by its edition date instead of its archiving date. We apply our framework on an archived Moroccan forum and witness how it reacted to the Arab Spring at the end of 2010.
Subjects
free text keywords: Event detection, Online migrants collectives, Web archives, [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], [INFO.INFO-WB]Computer Science [cs]/Web
Related Organizations

1. Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Vips: a vision-based page segmentation algorithm (2003)

2. CERN: The document that officially put the world wide web into the public domain (1993), http://cds.cern.ch/record/1164399

3. Diminescu, D.: e-Diasporas Atlas. Explorations and Cartography of Diasporas on Digital Networks. Ed. de la Maison des Sciences de l'Homme, Paris (2012)

4. Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on Very large data bases. pp. 181-192. VLDB Endowment (2005)

5. Jatowt, A., Kawai, Y., Tanaka, K.: Detecting age of page content. In: Proceedings of the 9th annual ACM international workshop on Web information and data management. pp. 137-144. ACM (2007)

6. Kahle, B.: Preserving the internet. Scientific American pp. 276, 82-83 (Mar 1997)

7. Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining. pp. 441-450. WSDM '10, ACM, New York, NY, USA (2010) [OpenAIRE]

8. Masanès, J.: Web Archiving. Springer, New York (2006)

Abstract
International audience; As the living Web expands, worldwide volumes of Web archives constantly increase, making difficult to identify relevant archived contents. Here we propose an application for detecting historical events out of a corpus of Web archives and based on an entity called Web Fragment: a semantic and syntactic subset of a given Web page. The Web fragment has the particularity to be indexed by its edition date instead of its archiving date. We apply our framework on an archived Moroccan forum and witness how it reacted to the Arab Spring at the end of 2010.
Subjects
free text keywords: Event detection, Online migrants collectives, Web archives, [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], [INFO.INFO-WB]Computer Science [cs]/Web
Related Organizations

1. Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Vips: a vision-based page segmentation algorithm (2003)

2. CERN: The document that officially put the world wide web into the public domain (1993), http://cds.cern.ch/record/1164399

3. Diminescu, D.: e-Diasporas Atlas. Explorations and Cartography of Diasporas on Digital Networks. Ed. de la Maison des Sciences de l'Homme, Paris (2012)

4. Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on Very large data bases. pp. 181-192. VLDB Endowment (2005)

5. Jatowt, A., Kawai, Y., Tanaka, K.: Detecting age of page content. In: Proceedings of the 9th annual ACM international workshop on Web information and data management. pp. 137-144. ACM (2007)

6. Kahle, B.: Preserving the internet. Scientific American pp. 276, 82-83 (Mar 1997)

7. Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining. pp. 441-450. WSDM '10, ACM, New York, NY, USA (2010) [OpenAIRE]

8. Masanès, J.: Web Archiving. Springer, New York (2006)

Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue