
Long short-term memory recurrent neural network (LSTM-RNN) have witnessed as a powerful approach for capturing long-range temporal dependencies in sequences of arbitrary length. This paper seeks to model a large set of Android permissions particularly the permissions from Normal, Dangerous, Signature and Signature Or System categories within a large number of Android application package (APK) files of Cyber Security Data Mining Competition (CDMC 2016), Android malware classification challenge. The sequences of Android permissions are transformed into features by using recurrent LSTM layer with bag-of-words embedding and the extracted features are fed into dense and activation layer with non-linear activation function such as sigmoid for classification. Furthermore, to selectively find out the optimal paramaters and network structure, we have done various experimens with different network parameters and network structures. All experiments are run up to 1000 epochs with a learning rate in the range [0.01-0.5]. All LSTM network configurations have substantially performed well in classification settings of 5-fold cross validation in comparison to the recurrent neural network (RNN). Most importantly, LSTM has achieved the highest accuracy as 0.897 on the real-world Android malware test data set, provided by CDMC2016. This is primarily due to fact that the LSTM houses a complex memory processing unit that facilitates to learn the temporal behaviors quickly with sparse representations of Android permissions sequences. Thus, we claim that applying LSTM network to permission based Android malware classification is more appropriate.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 58 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
