Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Software . 2025
License: CC BY
Data sources: Datacite
ZENODO
Software . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Code and data for "Optimization of regulatory DNA with active learning"

Authors: Shen, Yuxin; Kudla, Grzegorz; Oyarzún, Diego;

Code and data for "Optimization of regulatory DNA with active learning"

Abstract

Code and data for paper “Optimization of regulatory DNA with active learning” by Shen, Kudla and Oyarzún. data.zip - includes all NK landscapes in csv format. code.zip - includes Python code for reproducing the results of the paper. 1. Code overview. It contains two subfolders on NK landscape and promoter landscape respectively, and one environment file. - `AL.yml`: the environment for all the code AL on NK landscape 2. NK genotype-phenotype landscapes (Figure 1) - `nk_landscape.ipynb`: Generate the NK0-NK3 landscapes and save them in csv files as ground truth landscapes. The NK model is derived from a previous NK simulation in paper [1] from https://github.com/acmater/NK_Benchmarking/blob/master/utils/nk_utils/NK_landscape.py. - `nk_local_landscape.ipynb`: Generate the NK1-NK3 local landscapes. - `nk_tsne.ipynb`: Plot the 2D t-SNE embedding plots of the genotype space, and label the seqeunces according to their phenotype (Figure 1C). - `nk_mlp.ipynb`: Train MLP models on four NK landscapes (Figure 1D). 3. AL on NK genotype-phenotype landscapes (Figure 2) - `AL_NK_pipeline.ipynb`: The active learning pipeline on NK landscape. Different conditions like AL with random sampling and ALDE can be set inthe pipeline. - `NK_benchmarking_ho.ipynb`: One-shot model performance on the NK landscapes with hyperparameter optimization to compare with AL performance. Three optimization methods on one-shot modelling are implemented: random screening (RS), strong-selection weak-mutation (SSWM) and gradient descent (GD). AL on Promoter landscape 4. AL on NK genotype-phenotype landscapes (Figure 3) - `Glu_model.py`, `Ura_model.py`: The code to use the pre-trained promoter landscape. The promoter landscape is derived from the trained transformer structure with a large-scale characterization of promoter expression in paper [2] from https://github.com/1edv/evolution/. - `AL_loop.py`: The main script for active learning pipeline on promoter landscape. - `AL_sampling_methods.py`: The selection methods for the active learning pipeline on promoter landscape. - `AL_selection.py`: The UCB function for the active learning pipeline on promoter landscape, adapted from the paper [3]. - `promoter_benchmarking_ho.ipynb`: One-shot model performance on promoter landscape with hyperparameter optimization to compare with AL performance. Three optimization methods on one-shot modelling are implemented: random screening (RS), strong-selection weak-mutation (SSWM) and gradient descent (GD). 5. Biological sampling and motif information (Figure 4) - `motif_analysis.ipynb`: Conduct motif analysis for the batches sampled by AL. (Figure 4C) - `AL_PFM.py`: Combine the motif information calculation into the UCB function. References [1] Sandhu et al, "Investigating the determinants of performance in machine learning for protein fitness prediction," Protein Science (2025). [2] Vaishnav et al. "The evolution, evolvability and engineering of gene regulatory DNA." Nature (2022).

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities