BojarLab/glycowork: v1.1.0

Change Log glycan_data Updated sugarbase database and all models stats Newly added module to glycowork Moved all the statistics functions from motif.processing into this module: cohen_d, mahalanobis_distance, mahalanobis_variance, variance_stabilization, MissForest, impute_and_normalize, and variance_based_filtering Added fast_two_sum, two_sum, expansion_sum, hlm, update_cf_for_m_n, jtkdist, jtkinit, jtkstat, and jtkx helper functions for JTK test Added get_BF to calculate Jeffreys' approximate Bayes factor based on sample size and p-value Added get_alphaN to calculate sample size-appropriate significance cut-offs informed by Bayesian statistics Added pi0_tst and TST_grouped_benjamini_hochberg to perform a Two-Stage adaptive Benjamini-Hochberg procedure based on groups (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3175141/ or https://www.biorxiv.org/content/10.1101/2024.01.13.575531v1) Added test_inter_vs_intra_group to estimate intra- versus inter-group correlation with a mixed-effects model for groupings of glycans based on domain expertise motif regex Newly added module to glycowork Added the get_match function and associated functions to implement a regular expression system for glycans. This allows for powerful queries to detect and extract motifs of arbitrary complexity. processing Moved cohen_d, mahalanobis_distance, mahalanobis_variance, variance_stabilization, MissForest, impute_and_normalize, and variance_based_filtering into glycan_data.stats to re-focus processing on processing glycan sequences Extended canonicalize_composition to cases like '5_4_2_1', '5421', and '(Hex)2 (HexNAc)2 (Deoxyhexose)1 (NeuAc)2 + (Man)3(GlcNAc)2' GlycoCT and WURCS handling for universal input now encompass more monosaccharides and more modifications Expanded oxford_to_iupac to handle more complex sequences, including sulfation, LacdiNAc, hybrid structures, extended Neu5Ac, complex fucosylation, more custom linkage specifications enforce_class can now deal with free glycans regardless of whether they end in '-ol' or not annotate annotate_dataset and downstream functions now accept a new keyword in "feature_set", called "custom". If "custom" is added to "feature_set", a list of custom motifs can and must be added via the "custom_motifs" keyword argument. "custom" can be mixed and matched with all other keywords in "feature_set" annotate_dataset now also accepts glyco-regular expressions via the "custom" keyword in "feature_set". These expressions need to be added within the "custom_motifs" keyword argument and have to start with an "r", such as "rHex-HexNAc-([Hex|Fuc]){1,2}-HexNAc". Normal motifs and glyco-regular expressions can be freely mixed within "custom_motifs" Added group_glycans_core, group_glycans_sia_fuc, and group_glycans_N_glycan_type to group glycans by core structure (for O-glycans), Sia/Fuc/FucSia/Rest, or complex/hybrid/high-man/rest (for N-glycans) Fixed a bug in get_k_saccharides, in which redundant columns were not always correctly removed analysis Added get_jtk to analyze circadian expression of glycans in temporal glycomics datasets using the Jonckheere–Terpstra–Kendall (JTK) algorithm, with the typical interface for motifs and imputation etc analogous to differential expression. get_differential_expression, get_glycanova, and get_jtk now use get_alphaN to calculate a sample size-appropriate significance cut-off (see https://journals.sagepub.com/doi/10.1177/14761270231214429) and add a 'significant' column to the output to display whether the corrected p-values lie below this threshold Added the "zscores" keyword argument to get_pvals_motifs to perform z-score transformation if used data are not yet z-score transformed, by setting "zscores" to False For statistical calculations, get_pval_motifs will now weigh the motif occurrences by z-score magnitude, rather than only using a cut-off for enrichment calculations Added effect size calculations to get_pval_motifs which are also in the output, as Cohen's d Changed get_pval_motifs such that now both enrichments and depletions will be tested (with depletions resulting in negative effect sizes) Added select_grouping to find out which grouping of glycans has the highest intra- versus inter-group correlation, as estimated by glycan_data.stats.test_inter_vs_intra_group When "motifs = False" and "grouped_BH = True", get_differential_expression now tries to use the Two-Stage adaptive Benjamini-Hochberg procedure based on groups for multiple testing correction, if meaningful groups can be found in the glycans [note this makes everything at least one order of magnitude slower, though most datasets should still finish in a few seconds] draw In GlycoDraw, the "highlight_motif" keyword argument can now use glyco-regular expressions in addition to regular motifs (just add a single 'r' before your glyco-regular expression to indicate that it is indeed a regular expression) Added plot_glycans_excel to allow for the automated insertion of GlycoDraw SNFG pictures into an Excel file containing glycan sequences graph categorical_node_match_wildcard now uses string ID for matching, instead of integer ID, which means even two graphs, generated with two different libs, can now be successfully compared via compare_glycans or subgraph_isomorphism compare_glycans or subgraph_isomorphism (and all functions using these functions) now support negation, by prepending "!". For instance, "!Fuc(a1-?)Gal(b1-4)GlcNAc" will match subsequences that have a monosaccharide that is NOT Fuc before the Gal. It is highly recommend to generate your own lib via get_lib if you use negation, as monosaccharides such as !Fuc are not within lib and will cause indexing errors. Added "?1-?" as another ultimate wildcard (promoting it from a strong narrow wildcard) Fixed some cases where "Monosaccharide" was not treated as an ultimate wildcard in graph operations Fixed an issue in graph_to_string in which glycans of size 1 (e.g., "GalNAc") sometimes were missing their first character network Updated pre-calculated biosynthetic networks for milk oligosaccharides biosynthesis Refactored find_diff to make networks compatible with the automated, dynamic wildcards (i.e., ? behave as they should and don't necessarily cause over-branching of the network) In highlight_network, the "motif" keyword argument can now use glyco-regular expressions in addition to regular motifs (just add a single 'r' before your glyco-regular expression to indicate that it is indeed a regular expression) ml model_training In training_setup, upgraded the loss functions for all classification problems to PolyLoss with label smoothing (see https://arxiv.org/abs/2204.12511 for details). In training_setup, number of classes (for multiclass or multilabel classification) can now be specified via the new "num_classes" keyword argument

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average