
This dataset contains health-related news data from GDELT project with ICD9-CM annotations, covering January 2020 to December 2022. Each CSV file represents one month of data with the following fields: country_code: country code where news originated news_datetime: Timestamp of news publication json_col: A json object containing additional metadata from GDELT in JSON format, including the field "quotes" icd9_code: list of top 3 ICD9-CM code obtained with zero-shot classification of the field "quotes" icd9_annotation: description associated to the ICD9-CM codes in the field icd9_code Files are named using YYYY_MM format (e.g., 2020_01.csv for January 2020).
GDELT, medical annotations, ICD9-CM, news, time series
GDELT, medical annotations, ICD9-CM, news, time series
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
