
The dark genome, as defined by those genes that are understudied (Opera, 2019) have been defined to some extent by the Illuminating the Druggable Genome, IDG project (RRID:SCR_016924). One goal of this project is to encourage the study of genes that are likely to be important for precision medicine but are currently understudied. The KOMP project (RRID:SCR_005571) aims to knock out 8500 genes from mice, to deliver these mice and their associated phenotype data to the research community also to illuminate the dark genome. Pharos pharos.nih.gov (RRID:SCR_016258) provides a list of gene symbols and a ‘Tdark’ tag when genes are not sufficiently understood. We apply this convenient definition of darkness as a baseline. The RRID project works with antibody manufacturers, plasmid providers and animal stock centers and contains the largest freely available list of these key resources. We use the search for individual gene names in the RRID.site API as a proxy for the availability of tools. We do not know how well the presence of a knock out animal, the presence of data as defined by the IDG project, or the availability of tools like antibodies and plasmids impact the attention of researchers on specific genes. Thus to determine if the availability of scientific tools (including animals, antibodies and plasmids) could substitute for a more manual definition of the dark genome we explored correlations between the Pharos dark genome definition and the numbers of resources. The output is a survey of key data about the less studied genes that are likely to be interesting targets for the study of disease. Methods First we wanted to enhance our understanding of the dark genome. We downloaded the list of mouse genes and a list of manuscripts associated with mouse genes from the NCBI gene (RRID:SCR_002473). After the compilation of the full list of mouse genes, we queried the scicrunch.org Application Programming Interface, API, to determine which antibodies, plasmids and organisms were available for scientists to use for which genes. Papers were assessed by querying the PubMed database (RRID:SCR_004846). Drugs were assessed by using the Drug-Gene Interaction Database (DGIdb) database (RRID:SCR_006608). For each mouse gene, we queried the DGIdb API and extracted the number of drugs associated with the gene. We downloaded the Pharos dataset from https://pharos.nih.gov/targets (RRID:SCR_016258) on Tuesday, October 7, 2025. We found that out of 16,240 genes in both the pharos and NCBI mouse gene databases, 2,684 were considered dark according to pharos. With this data, we created five candidate lists of dark genes: for each of the five above resources (antibodies, plasmids, organisms, papers, and drugs), we defined darkness as follows: we established a threshold such that all genes associated with less resources of that type than the thresholds are considered dark. For example, all genes mentioned in less than 17 papers are dark. We compared our dark gene lists to the pharos list using Fisher's Exact Test, and picked the thresholds minimizing the resulting p-value. When multiple thresholds returned a p-value of 0.0, we picked the threshold such that the number of genes in our dark list was the closest to the number of genes in the pharos dark list. We will present the data in a histogram where gaps in research tools will be highlighted and genes that seem to have significant tooling will be shown as a set of genes that should be able to be explored. Areas in which few papers exist, but tools are available will be highlighted. Results We explored whether the dark genome can be defined as that set of genes that are lacking any tools, such as plasmids, drugs, antibodies and transgenic animals. We posited that at least one tool that can specifically probe a gene or gene product creates a possibility of that gene can be studied, thus much more likely to be a light gene. We used the definition of dark genome as a list defined by the Pharos consortium. Our comparison yielded the number of resources in each type that most closely corresponded with the Pharos dark genes, and for two types, plasmids and drugs, the number of reagents that was most closely associated with the dark genome was one. Animals and antibodies required additional resources before they could be considered light, which may be because those tools are not quite as robust for the study of a gene. Interestingly, the analyses yielded a set of genes that are “most likely to become light” because they have a significant number of associated tools, but are still considered part of the dark genome by Pharos. REFERENCE Oprea, T. I. (2019). Exploring the dark genome: implications for precision medicine. Mammalian Genome, 30(7), 192-200.
plasmids, reagents, antibodies, dark genome
plasmids, reagents, antibodies, dark genome
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
