xmubiocode/MMPCS: MMPCS

- MMPCS: Multi-view Molecular Pretraining Based on Consistency Information and Specific Information Chenyang Xie1, +, Yingying Song1, +, Song He2, *, Xiaochen Bo2, *, Zhongnan Zhang1, * The goal of molecular representation learning is to automate the extraction of molecular features, a critical task in cheminformatics and drug discovery. While pretraining models using multiple views like SMILES, two-dimensional graphs, and three-dimensional conformations have advanced the field, integrating them effectively to produce superior representations remains a challenge. To bridge this gap, we propose a novel multi-view molecular pretraining method termed MMPCS, which explicitly factorizes representations into consistency and specific information. Our approach utilizes the Graph Isomorphism Network and the RoBERTa model to encode two-dimensional molecular topological graphs and SMILES sequences, respectively. Each resulting molecular embedding is decomposed into a shared consistency component and a view-specific remainder. An autoencoder then aligns the consistency information across views. The combined consistency and view-specific representations serve as input for downstream tasks, enabling precise and task-aware predictions. When benchmarked against 16 state-of-the-art molecular pretraining methods, MMPCS achieved the highest average performance across both classification and regression tasks for molecular property prediction. It also delivered outstanding results in predicting drug-target binding affinity and cancer drug response, demonstrating its robustness and broad applicability. Additionally, a case study on the SARS-CoV-2 Omicron variant highlights the potential of MMPCS in facilitating drug repurposing efforts.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

Corona Virus Disease

Cancer Research