
Change Log glycan_data Updated sugarbase database and all models stats Newly added module to glycowork Moved all the statistics functions from motif.processing into this module: cohen_d, mahalanobis_distance, mahalanobis_variance, variance_stabilization, MissForest, impute_and_normalize, and variance_based_filtering Added fast_two_sum, two_sum, expansion_sum, hlm, update_cf_for_m_n, jtkdist, jtkinit, jtkstat, and jtkx helper functions for JTK test Added get_BF to calculate Jeffreys' approximate Bayes factor based on sample size and p-value Added get_alphaN to calculate sample size-appropriate significance cut-offs informed by Bayesian statistics Added pi0_tst and TST_grouped_benjamini_hochberg to perform a Two-Stage adaptive Benjamini-Hochberg procedure based on groups (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3175141/ or https://www.biorxiv.org/content/10.1101/2024.01.13.575531v1) Added test_inter_vs_intra_group to estimate intra- versus inter-group correlation with a mixed-effects model for groupings of glycans based on domain expertise motif regex Newly added module to glycowork Added the get_match function and associated functions to implement a regular expression system for glycans. This allows for powerful queries to detect and extract motifs of arbitrary complexity. processing Moved cohen_d, mahalanobis_distance, mahalanobis_variance, variance_stabilization, MissForest, impute_and_normalize, and variance_based_filtering into glycan_data.stats to re-focus processing on processing glycan sequences Extended canonicalize_composition to cases like '5_4_2_1', '5421', and '(Hex)2 (HexNAc)2 (Deoxyhexose)1 (NeuAc)2 + (Man)3(GlcNAc)2' GlycoCT and WURCS handling for universal input now encompass more monosaccharides and more modifications Expanded oxford_to_iupac to handle more complex sequences, including sulfation, LacdiNAc, hybrid structures, extended Neu5Ac, complex fucosylation, more custom linkage specifications enforce_class can now deal with free glycans regardless of whether they end in '-ol' or not annotate annotate_dataset and downstream functions now accept a new keyword in "feature_set", called "custom". If "custom" is added to "feature_set", a list of custom motifs can and must be added via the "custom_motifs" keyword argument. "custom" can be mixed and matched with all other keywords in "feature_set" annotate_dataset now also accepts glyco-regular expressions via the "custom" keyword in "feature_set". These expressions need to be added within the "custom_motifs" keyword argument and have to start with an "r", such as "rHex-HexNAc-([Hex|Fuc]){1,2}-HexNAc". Normal motifs and glyco-regular expressions can be freely mixed within "custom_motifs" Added group_glycans_core, group_glycans_sia_fuc, and group_glycans_N_glycan_type to group glycans by core structure (for O-glycans), Sia/Fuc/FucSia/Rest, or complex/hybrid/high-man/rest (for N-glycans) Fixed a bug in get_k_saccharides, in which redundant columns were not always correctly removed analysis Added get_jtk to analyze circadian expression of glycans in temporal glycomics datasets using the Jonckheere–Terpstra–Kendall (JTK) algorithm, with the typical interface for motifs and imputation etc analogous to differential expression. get_differential_expression, get_glycanova, and get_jtk now use get_alphaN to calculate a sample size-appropriate significance cut-off (see https://journals.sagepub.com/doi/10.1177/14761270231214429) and add a 'significant' column to the output to display whether the corrected p-values lie below this threshold Added the "zscores" keyword argument to get_pvals_motifs to perform z-score transformation if used data are not yet z-score transformed, by setting "zscores" to False For statistical calculations, get_pval_motifs will now weigh the motif occurrences by z-score magnitude, rather than only using a cut-off for enrichment calculations Added effect size calculations to get_pval_motifs which are also in the output, as Cohen's d Changed get_pval_motifs such that now both enrichments and depletions will be tested (with depletions resulting in negative effect sizes) Added select_grouping to find out which grouping of glycans has the highest intra- versus inter-group correlation, as estimated by glycan_data.stats.test_inter_vs_intra_group When "motifs = False" and "grouped_BH = True", get_differential_expression now tries to use the Two-Stage adaptive Benjamini-Hochberg procedure based on groups for multiple testing correction, if meaningful groups can be found in the glycans [note this makes everything at least one order of magnitude slower, though most datasets should still finish in a few seconds] draw In GlycoDraw, the "highlight_motif" keyword argument can now use glyco-regular expressions in addition to regular motifs (just add a single 'r' before your glyco-regular expression to indicate that it is indeed a regular expression) Added plot_glycans_excel to allow for the automated insertion of GlycoDraw SNFG pictures into an Excel file containing glycan sequences graph categorical_node_match_wildcard now uses string ID for matching, instead of integer ID, which means even two graphs, generated with two different libs, can now be successfully compared via compare_glycans or subgraph_isomorphism compare_glycans or subgraph_isomorphism (and all functions using these functions) now support negation, by prepending "!". For instance, "!Fuc(a1-?)Gal(b1-4)GlcNAc" will match subsequences that have a monosaccharide that is NOT Fuc before the Gal. It is highly recommend to generate your own lib via get_lib if you use negation, as monosaccharides such as !Fuc are not within lib and will cause indexing errors. Added "?1-?" as another ultimate wildcard (promoting it from a strong narrow wildcard) Fixed some cases where "Monosaccharide" was not treated as an ultimate wildcard in graph operations Fixed an issue in graph_to_string in which glycans of size 1 (e.g., "GalNAc") sometimes were missing their first character network Updated pre-calculated biosynthetic networks for milk oligosaccharides biosynthesis Refactored find_diff to make networks compatible with the automated, dynamic wildcards (i.e., ? behave as they should and don't necessarily cause over-branching of the network) In highlight_network, the "motif" keyword argument can now use glyco-regular expressions in addition to regular motifs (just add a single 'r' before your glyco-regular expression to indicate that it is indeed a regular expression) ml model_training In training_setup, upgraded the loss functions for all classification problems to PolyLoss with label smoothing (see https://arxiv.org/abs/2204.12511 for details). In training_setup, number of classes (for multiclass or multilabel classification) can now be specified via the new "num_classes" keyword argument
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
