The dataset contains the following files: - adenylate.zip - antitrypsin.zip - tamapin.zip - analysis_notebooks.zip Each of these refers to one of three proteins. For each CG sites number N, each compressed folder contains the following files: random mappings (random_mappings_${N}.txt) random mapping entropies (random_smaps_${N}.txt) [fig1] optimal mappings (lowest_mappings_${N}.txt) [fig3, fig4, figS2] optimal mapping entropies (lowest_smaps_${N}.txt) [fig1] pdb files with conservations probabilities in the beta factor column (${N}_probs.pdb) [fig4, figs2] SASA values (${protein_name}_SASA_residues.xvg transition mapping entropies (${protein_name}_transition_smaps.txt) [fig2] additional transition mapping entropies (${protein_name}_transition_smaps*) [figs3] The file analysis_notebooks.zip contains the python3 notebooks employed to perform all the analysis present in the paper: paper_analysis_adenylate.ipynb paper_analysis_antitrypsin.ipynb paper_analysis_tamapin.ipynb Packages required for the usage of these python 3 scripts: - numpy - pandas - matplotlib - seaborn