Detecting new obfuscated malware variants: a lightweight and interpretable machine learning approach

Madamidola, Oladipo; Ngobigha, Felix; Ez-zizi, Adnane

Detecting new obfuscated malware variants: a lightweight and interpretable machine learning approach

Tools

Madamidola, Oladipo, Ngobigha, Felix and Ez-zizi, Adnane (2024) Detecting new obfuscated malware variants: a lightweight and interpretable machine learning approach. Intelligent Systems with Applications, 25. ISSN 2667-3053

[thumbnail of Detecting new obfuscated malware variants---.pdf]

Preview

Text
Detecting new obfuscated malware variants---.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (3MB) | Preview

Official URL: https://www.sciencedirect.com/science/article/pii/...

Abstract

Machine learning has been successfully applied in developing malware detection systems, with a primary focus on accuracy, and increasing attention to reducing computational overhead and improving model interpretability. However, an important question remains under explored: How well can machine learning-based models detect entirely new forms of malware not present in the training data? In this study, we present a machine learning-based system for detecting obfuscated malware that is not only highly accurate, lightweight and interpretable, but also capable of successfully adapting to new types of malware attacks. Our system is capable of detecting 15 malware subtypes despite being exclusively trained on one malware subtype, namely the Transponder from the Spyware family. This system was built after training 15 distinct random forest-based models, each on a different malware subtype from the CICMalMem-2022 dataset. These models were evaluated against the entire range of malware subtypes, including all unseen malware subtypes. To maintain the system’s streamlined nature, training was confined to the top five most important features, which also enhanced interpretability. The Transponder-focused model exhibited high accuracy, exceeding 99.8%, with an average processing speed of 5.7 µs per file. We also illustrate how the Shapley additive explanations technique can facilitate the interpretation of the model predictions. Our research contributes to advancing malware detection methodologies, pioneering the feasibility of
detecting obfuscated malware by exclusively training a model on a single or a few carefully selected malware subtype and applying it to detect unseen subtypes.

Item Type:	Article
Uncontrolled Keywords:	cyber security, obfuscated malware, detection of unknown malware, machine learning, explainable machine learning
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science T Technology > T Technology (General)
Divisions:	Faculty of Arts, Business & Applied Social Science > School of Technology, Business & Arts
Depositing User:	Felix Ngobigha
Date Deposited:	06 Jan 2025 11:27
Last Modified:	29 Jan 2025 10:43
URI:	https://oars.uos.ac.uk/id/eprint/4539

Open Access Repository Suffolk (OARS)

Detecting new obfuscated malware variants: a lightweight and interpretable machine learning approach

Abstract

Downloads

Origin of downloads

Actions (login required)