Name: Fuzzy ARTMAP Prediction of Biological Activities for Potential HIV-1 Protease Inhibitors Using a Small Molecular Data Set
Keywords: Models, Statistical, Chemical Phenomena, Databases, Factual, Models, Genetic, Computer Sciences, Data Science, Fuzzy neural networks, Computational Biology, Quantitative Structure-Activity Relationship, Reproducibility of Results

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2011Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE/ACM Transactions on Computational Biology and Bioinformatics, volume 8, pages 80-93 (issn: 1545-5963,

Authors: Andonie, RÄƒzvan; Fabry-Asztalos, Levente; Abdul-Wahid, Christopher B.; Abdul-Wahid, Sarah; Barker, Grant I.; Magill, Lukas C.;

doi: 10.1109/tcbb.2009.50

pmid: 21071799

Fuzzy ARTMAP Prediction of Biological Activities for Potential HIV-1 Protease Inhibitors Using a Small Molecular Data Set

- Summary
- Subjects
- Metrics

Abstract

Obtaining satisfactory results with neural networks depends on the availability of large data samples. The use of small training sets generally reduces performance. Most classical Quantitative Structure-Activity Relationship (QSAR) studies for a specific enzyme system have been performed on small data sets. We focus on the neuro-fuzzy prediction of biological activities of HIV-1 protease inhibitory compounds when inferring from small training sets. We propose two computational intelligence prediction techniques which are suitable for small training sets, at the expense of some computational overhead. Both techniques are based on the FAMR model. The FAMR is a Fuzzy ARTMAP (FAM) incremental learning system used for classification and probability estimation. During the learning phase, each sample pair is assigned a relevance factor proportional to the importance of that pair. The two proposed algorithms in this paper are: 1) The GA-FAMR algorithm, which is new, consists of two stages: a) During the first stage, we use a genetic algorithm (GA) to optimize the relevances assigned to the training data. This improves the generalization capability of the FAMR. b) In the second stage, we use the optimized relevances to train the FAMR. 2) The Ordered FAMR is derived from a known algorithm. Instead of optimizing relevances, it optimizes the order of data presentation using the algorithm of Dagher et al. In our experiments, we compare these two algorithms with an algorithm not based on the FAM, the FS-GA-FNN introduced in [4], [5]. We conclude that when inferring from small training sets, both techniques are efficient, in terms of generalization capability and execution time. The computational overhead introduced is compensated by better accuracy. Finally, the proposed techniques are used to predict the biological activities of newly designed potential HIV-1 protease inhibitors.

Related Organizations

Central Washington University
United States
Purdue University Fort Wayne
United States
Transylvania University of Brașov
Romania
Purdue University System
United States

Keywords

Models, Statistical, Chemical Phenomena, Databases, Factual, Models, Genetic, Computer Sciences, Data Science, Fuzzy neural networks, Computational Biology, Quantitative Structure-Activity Relationship, Reproducibility of Results, data mining, HIV Protease Inhibitors, evolutionary computing and genetic algorithms, computational chemistry, Chemistry, Fuzzy Logic, Drug Discovery, Linear Models, Data Mining, Neural Networks, Computer, Algorithms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	12
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%