
In this project, we have analysed the defensome of Acinetobacter baumannii with the aim of profiling different defense systems associated with particular prophage profiles, as well as to predict which systems are more effective and against which specific phages, associating both positively and negatively prophages to defense systems using machine learning techniques. DOI (Biorxiv): https://doi.org/10.1101/2024.10.26.620419 Python scripts Package versions: numpy 1.26.4 pandas 2.2.2 binary_matrix.py Generate a binary matrix of defense systems using genomes without prophages, as input of the Upset plot. coocurr_matrix.py Generate a matrix of defense systems coappearance, as input in the fig 2A. defsys_pres_ann.py Create a presence-absence matrix of defense systems. freq_phages_bymlst.py Get the most frequent prophages (10% of genomes) per MLST (provided in a list). matrix_mlst_phages_freq.py Generate two matrix of absolute and relative frequency, respectively, of prophages by frequent MLST group. pres_aus_matrix_cl.py Create a presence-absence matrix of prophages. matrix_preaus_ml.py Add to the presence-absence matrix of prophages two columns: one with the defense systems of each genome and another with the MLST group to which they belong. cdhit_heamtap.py Read CD-HIT output files and builds a variant matrix with the most prevalent clusters. triangle_to_square.py Read an upper triangular matrix (emboss format) format and converts it into a square matrix. merge_dist.py Merge both distance values (phylogenetic distance from the tree built in IQ-TREE and Kimura distance from the sequence alignment) from the same genome.and determine the MLST relationship between those genomes. Phylogeny Use assembly_seq.pl and uniq_sl.pl to build the initial multifasta with only the core genes, as input of MAFFT software. The generated MSA is processed using Clipkit, to eliminate gaps and keep the most informative regions. The processed MSA is used as input to iqTREE to generate the tree. Circos Circos were plotted using files generated by prepareForCircos2.pl. This script uses "defsys_presaus_ann.tsv", "logical_viruses.tsv" and a list of genomes of each MLST group to create the input file for the figure. These files are also provided. README.txt A more detailed version of the protocol used to generate the results and figures used in the paper.
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
