Analysis of Malware behavior: Type classification using machine learning

Radu-Stefan Pirscoveanu, Steven Strandlund Hansen, Thor Mark Tampus Larsen, Matija Stevanovic, Jens Myrup Pedersen, Alexandre Czech

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

28 Citations (Scopus)

Abstract

Malicious software has become a major threat to modern society, not only due to the increased complexity of the malware itself but also due to the exponential increase of new malware each day. This study tackles the problem of analyzing and classifying a high amount of malware in a scalable and automatized manner. We have developed a distributed malware testing environment by extending Cuckoo Sandbox that was used to test an extensive number of malware samples and trace their behavioral data. The extracted data was used for the development of a novel type classification approach based on supervised machine learning. The proposed classification approach employs a novel combination of features that achieves a high classification rate with a weighted average AUC value of 0.98 using Random Forests classifier. The approach has been extensively tested on a total of 42,000 malware samples. Based on the above results it is believed that the developed system can be used to pre-filter novel from known malware in a future malware analysis system.
Original languageEnglish
Title of host publicationInternational Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015
Number of pages7
PublisherIEEE
Publication dateAug 2015
ISBN (Print)9781467367974
DOIs
Publication statusPublished - Aug 2015
EventInternational Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015 - London, United Kingdom
Duration: 8 Jun 20159 Jun 2015

Conference

ConferenceInternational Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015
CountryUnited Kingdom
CityLondon
Period08/06/201509/06/2015
SeriesInternational Conference on Cyber Situational Awareness, Data Analytics and Assessment Proceedings. (cyberSA)

Fingerprint

Learning systems
Malware
Computer systems
Classifiers
Testing

Keywords

  • Malware
  • Type-Classification
  • Dynamic Analysis
  • Scalability
  • Cuckoo Sandbox
  • Random Forests
  • API call
  • Feature Selection
  • Supervised Machine Learning

Cite this

Pirscoveanu, R-S., Hansen, S. S., Larsen, T. M. T., Stevanovic, M., Pedersen, J. M., & Czech, A. (2015). Analysis of Malware behavior: Type classification using machine learning. In International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015 IEEE. International Conference on Cyber Situational Awareness, Data Analytics and Assessment Proceedings. (cyberSA) https://doi.org/10.1109/CyberSA.2015.7166115
Pirscoveanu, Radu-Stefan ; Hansen, Steven Strandlund ; Larsen, Thor Mark Tampus ; Stevanovic, Matija ; Pedersen, Jens Myrup ; Czech, Alexandre . / Analysis of Malware behavior : Type classification using machine learning. International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015. IEEE, 2015. (International Conference on Cyber Situational Awareness, Data Analytics and Assessment Proceedings. (cyberSA)).
@inproceedings{3d856ddfa4364a7b850664b5526cef70,
title = "Analysis of Malware behavior: Type classification using machine learning",
abstract = "Malicious software has become a major threat to modern society, not only due to the increased complexity of the malware itself but also due to the exponential increase of new malware each day. This study tackles the problem of analyzing and classifying a high amount of malware in a scalable and automatized manner. We have developed a distributed malware testing environment by extending Cuckoo Sandbox that was used to test an extensive number of malware samples and trace their behavioral data. The extracted data was used for the development of a novel type classification approach based on supervised machine learning. The proposed classification approach employs a novel combination of features that achieves a high classification rate with a weighted average AUC value of 0.98 using Random Forests classifier. The approach has been extensively tested on a total of 42,000 malware samples. Based on the above results it is believed that the developed system can be used to pre-filter novel from known malware in a future malware analysis system.",
keywords = "Malware, Type-Classification, Dynamic Analysis, Scalability, Cuckoo Sandbox, Random Forests, API call, Feature Selection, Supervised Machine Learning",
author = "Radu-Stefan Pirscoveanu and Hansen, {Steven Strandlund} and Larsen, {Thor Mark Tampus} and Matija Stevanovic and Pedersen, {Jens Myrup} and Alexandre Czech",
year = "2015",
month = "8",
doi = "10.1109/CyberSA.2015.7166115",
language = "English",
isbn = "9781467367974",
booktitle = "International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015",
publisher = "IEEE",
address = "United States",

}

Pirscoveanu, R-S, Hansen, SS, Larsen, TMT, Stevanovic, M, Pedersen, JM & Czech, A 2015, Analysis of Malware behavior: Type classification using machine learning. in International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015. IEEE, International Conference on Cyber Situational Awareness, Data Analytics and Assessment Proceedings. (cyberSA), International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015 , London, United Kingdom, 08/06/2015. https://doi.org/10.1109/CyberSA.2015.7166115

Analysis of Malware behavior : Type classification using machine learning. / Pirscoveanu, Radu-Stefan; Hansen, Steven Strandlund; Larsen, Thor Mark Tampus; Stevanovic, Matija; Pedersen, Jens Myrup; Czech, Alexandre .

International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015. IEEE, 2015.

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

TY - GEN

T1 - Analysis of Malware behavior

T2 - Type classification using machine learning

AU - Pirscoveanu, Radu-Stefan

AU - Hansen, Steven Strandlund

AU - Larsen, Thor Mark Tampus

AU - Stevanovic, Matija

AU - Pedersen, Jens Myrup

AU - Czech, Alexandre

PY - 2015/8

Y1 - 2015/8

N2 - Malicious software has become a major threat to modern society, not only due to the increased complexity of the malware itself but also due to the exponential increase of new malware each day. This study tackles the problem of analyzing and classifying a high amount of malware in a scalable and automatized manner. We have developed a distributed malware testing environment by extending Cuckoo Sandbox that was used to test an extensive number of malware samples and trace their behavioral data. The extracted data was used for the development of a novel type classification approach based on supervised machine learning. The proposed classification approach employs a novel combination of features that achieves a high classification rate with a weighted average AUC value of 0.98 using Random Forests classifier. The approach has been extensively tested on a total of 42,000 malware samples. Based on the above results it is believed that the developed system can be used to pre-filter novel from known malware in a future malware analysis system.

AB - Malicious software has become a major threat to modern society, not only due to the increased complexity of the malware itself but also due to the exponential increase of new malware each day. This study tackles the problem of analyzing and classifying a high amount of malware in a scalable and automatized manner. We have developed a distributed malware testing environment by extending Cuckoo Sandbox that was used to test an extensive number of malware samples and trace their behavioral data. The extracted data was used for the development of a novel type classification approach based on supervised machine learning. The proposed classification approach employs a novel combination of features that achieves a high classification rate with a weighted average AUC value of 0.98 using Random Forests classifier. The approach has been extensively tested on a total of 42,000 malware samples. Based on the above results it is believed that the developed system can be used to pre-filter novel from known malware in a future malware analysis system.

KW - Malware

KW - Type-Classification

KW - Dynamic Analysis

KW - Scalability

KW - Cuckoo Sandbox

KW - Random Forests

KW - API call

KW - Feature Selection

KW - Supervised Machine Learning

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-84963755556&origin=inward&txGid=0

U2 - 10.1109/CyberSA.2015.7166115

DO - 10.1109/CyberSA.2015.7166115

M3 - Article in proceeding

SN - 9781467367974

BT - International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015

PB - IEEE

ER -

Pirscoveanu R-S, Hansen SS, Larsen TMT, Stevanovic M, Pedersen JM, Czech A. Analysis of Malware behavior: Type classification using machine learning. In International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2015. IEEE. 2015. (International Conference on Cyber Situational Awareness, Data Analytics and Assessment Proceedings. (cyberSA)). https://doi.org/10.1109/CyberSA.2015.7166115