
handle: 1822/92817
The advent of bacterial strains resistant to virtually all currently available antibiotic agents is indubitably alarming. Hence, alternative strategies for efficaciously combating such multi-drug resistant bacterial strains ought to be considered. Bacteriophages (phages), the most abundant biological entities on Earth, may play a key role in such a therapeutic revolution. Phage therapy exploits the evolutionary context of phages and bacteria: millions of years of host-parasite coevolution made phages lethal bacterial predators. Nonetheless, the process of “recruiting” phages that competently combat bacterial infections is pro foundly dependent on the thorough understanding of their biology and safety. Yet, under the tremendous diversity of phage’s genomes, most of their genes cannot be assigned to functions via homology-based techniques, significantly hindering such fundamental insights. PhageAnnotate, a phage annotation system powered by machine learning (ML), constitutes an attempt to transcend these obstacles. In order to confidently assemble a robust computational tool, several model architectures were put to test, namely Naïve, Linear, Hierarchical and Hierarchical-X. Such models, differing both at organizational and structural levels, were trained on a collection of 368.436 phage DNA sequences, and carry out a quite straightforward task: given a phage DNA sequence, assign it to a label representing a functional role. Naïve and Linear encompass a single gradient boosting (GB) model, solely differing in the organization and strictness of the labels concerning functional roles. Hierarchical adds a layer of complexity to the prob lem at hand: prior to assigning functional roles to DNA sequences, it crudely classifies them in one of six umbrella functional classes (i.e., DNA-modification, DNA-replication, lysis, lysogeny-repressor, packaging and structural); only then, and depending on the ascertained functional class, a more fine-grained func tional role labeling is performed. Such structuring translates in the construction of seven ML models: one discerning functional classes and six distinguishing functional roles. Hierarchical-X behaves identically to the latter, and stems from unsurely enlarging the sequence database utilized for training the ML models. A cautious evaluation of these model architectures dictated that PhageAnnotate ought to be embodied by Hierarchical. PhageAnnotate’s predictions are, as a result, guided by a GB model discerning functional classes, three GB models distinguishing functional roles pertaining the functional classes DNA modification, DNA-replication and structural, and three support vector machine (SVM) models discerning functional roles concerning the functional classes lysis, lysogeny-repressor and packaging. The F1 scores attained by each of these models, constituting proxy measures for their competency, were 87.57%, 82.17%, 83.38%, 84.77%, 97.30%, 83.72% and 98.14%, respectively. A thorough assessment of PhageAnnotate, and subsequent juxtaposition with current, well-established phage annotation tools revealed the indisputable usefulness of the system, being able to produce functional annotations that clearly stand out relative to those of its direct competitors.
Anotação de genomas, Machine learning, Bacteriófagos, Bacteriophages, Genome annotation
Anotação de genomas, Machine learning, Bacteriófagos, Bacteriophages, Genome annotation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
