On Normalized Compression Distance and Large Malware

Article, Preprint English OPEN
Borbely, Rebecca Schuller;

Normalized Compression Distance (NCD) is a popular tool that uses compression algorithms to cluster and classify data in a wide range of applications. Existing discussions of NCD's theoretical merit rely on certain theoretical properties of compression algorithms. Howev... View more
  • References (25)
    25 references, page 1 of 3

    1. Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Recent advances in intrusion detection, pp. 178-197. Springer (2007)

    2. Bloom, C.: PPMZ: High compression markov predictive coder. http://www.cbloom.com/src/ppmz.html. Accessed: 2015-04-14

    3. Cebrian, M., Alfonseca, M., Ortega, A., et al.: Common pitfalls using the normalized compression distance: what to watch out for in a compressor. Commun. Inf. Syst. 5(4), 367-384 (2005)

    4. Chen, X., Francia, B., Li, M., Mckinnon, B., Seker, A.: Shared information and program plagiarism detection. IEEE Trans. Inf. Theory 50(7), 1545-1551 (2004)

    5. Cilibrasi, R., Cruz, A.L., de Rooij, S., Keijzer, M.: Complearn. http://www.complearn.org. Accessed: 2015-04-15

    6. Cilibrasi, R., Vitanyi, P., De Wolf, R.: Algorithmic clustering of music. In: Web Delivering of Music, 2004. WEDELMUSIC 2004. Proceedings of the Fourth International Conference on, pp. 110- 117. IEEE (2004)

    7. Cilibrasi, R., Vitányi, P.M.: Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523-1545 (2005)

    8. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21-27 (1967)

    9. Dandu, R.V.: Storage media for computers in radiology. Indian J. Radiol. Imaging 18(4), 287 (2008)

    10. Gailly, J.L., Adler, M.: zlib: A massively spiffy yet delicately unobtrusive compression library. http://www.zlib.net. Accessed: 2015-04-14

  • Metrics
Share - Bookmark