
This database contains a file for each sitting day in the Australian Parliament by the House of Representatives from 02 March 1998 to 08 September 2022 in the form of two ZIP folders, one with the data in CSV form and one with the data in Parquet form. These data were parsed entirely from the XML Hansard transcripts available on the Australian Parliament website. We developed four R scripts to parse and clean all of these XML files, and ran each file through two additional scripts: one to fill in missing speaker details, and one to validate each file using a suite of 7 automated tests. All scripts used to build this database are available at https://github.com/lindsaykatz/hansard-proj. Version 4 contains data on Hansard from 1998-2022, and all data in this version were parsed using a slightly different approach than what was used in version 1, which allows for better preservation of the correct chronological ordering of statements.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
