Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

The Knesset Meetings Corpus 2004-2005

Authors: Itai, Alon; Wintner, Shuly;

The Knesset Meetings Corpus 2004-2005

Abstract

The Knesset Meetings Corpus 2004-2005 is made up of two components: Raw texts - 282 files made up of 867,725 lines together. These can be downloaded in two formats: As doc files, encoded using windows-1255 encoding: kneset16.zip - Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror] kneset17.zip - Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror] As txt files, encoded using utf8 encoding: kneset.tar.gz - An archive of all the raw text files, divided into two folders: [Github mirror] 16 - Contains 164 text files made up of 543,228 lines together. 17 - Contains 118 text files made up of 324,497 lines together. knesset_txt_16.tar.gz- Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror] knesset_txt_17.zip - Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror] Tokenized and morphologically tagged texts - Tagged versions exist only for the files in the 16 folder. The text are represented using MILA's XML schema for corpora. These can be downloaded in two ways: knesset_tagged_16.tar.gz - An archive of all tokenized and tagged files. [MILA host] [Archive.org mirror] By cloning this repository, as the unarchived version of these files can be found in this repository, under the knesset_tagged folder.

The Open Natural Language Processing in Hebrew (NLPH) initiative is a joint effort by members of DataHack and The Public Knowledge Workshop to promote open tools and resources for Natural Language Processing in Hebrew. This community collects resources for NLP in Hebrew, as part of the NLPH project, which you can read more about here. These include corpora, lexicons, dictionaries, treebanks, embeddings, code, services, applications, papers, course materials and presentations, among others. A full list of these resources is maintained here: https://github.com/NLPH/NLPH_Resources If you have a resource you can contribute, to be released under some open license, please submit a pull request, or contact us at contact@nlph.org.il.

Keywords

Tokenization, Knesset, Hebrew, NLPH, morphologically tagged text, Transcripts, NLP

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 27
    download downloads 9
  • 27
    views
    9
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
27
9