research data . Dataset . 2019

The Knesset Meetings Corpus 2004-2005

Itai, Alon; Wintner, Shuly;
Open Access Hebrew
  • Published: 10 May 2019
  • Publisher: Zenodo
Abstract
The Knesset Meetings Corpus 2004-2005 is made up of two components: Raw texts - 282 files made up of 867,725 lines together. These can be downloaded in two formats: As <code>doc</code> files, encoded using <code>windows-1255</code> encoding: <code>kneset16.zip</code> - Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror] <code>kneset17.zip</code> - Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror] As <code>txt</code> files, encoded using <code>utf8</code> encoding: <code>kneset.tar.gz</code> - An archive of all the raw text files, divided into two folders: [Github mirror] <code>16</code> - Co...
Subjects
free text keywords: NLP, Hebrew, Knesset, Transcripts, Tokenization, morphologically tagged text, NLPH
Download fromView all 3 versions
Zenodo
Dataset . 2019
Provider: Datacite
Zenodo
Dataset . 2019
Provider: Datacite
Zenodo
Dataset . 2019
Provider: Datacite
Zenodo
Dataset . 2019
Provider: Datacite
Any information missing or wrong?Report an Issue