UNIFYING STATIC AND DYNAMIC MALWARE ANALYSIS

Project Summary: 

Learning to Execute Programs – The two main methods used in malware analysis today are static and dynamic analysis. In static analysis we analyse a program’s code without running it; looking at the sequence of assembly instructions, system-calls, resources requested and properties of the file itself such as its entropy and PE section sizes etc. In dynamic analysis we run a program in a virtual machine and record the sequence of events produced, such as the sequence of system calls, the sequence of disk reads and writes, and sequence of instructions executed etc.

In this project, we propose to unify static and dynamic analysis by using Deep Learning to predict a program’s dynamic run-time behaviour without the need to execute it on a virtual machine.

The project will consist of two parts:

 

  • In the first part, we train a deep network to examine the static executable in order to predict fine-grained low-level information such as the sequence of system events or order of instructions to be executed.
  • In the second part we will predict the program’s semantic meaning, or in other words, what actions the program intends to carry out. This kind of high-level information is useful for malware analysts who must decide if a given executable is likely to be harmful or not.

 

The central theme of this project is to train a deep neural network to learn to execute programs. The neural network will be trained on a large dataset of programs to predict their dynamic behaviour given only their static code. We would like the deep network to learn a semantic representation of programs based on their underlying behaviour and meaning, independent of their implementation. Such a semantic space may also be useful in the case of obfuscated programs that attempt to hide their behaviour from malware analysts. Due to the noisy and inaccurate labels available for existing malware datasets, it will be necessary to use semi-supervised or unsupervised learning to carry out this task.

Objectives:

  • Develop a method to predict a program’s dynamic run-time behaviour, and hence infer its high-level intensions, by analysing the static executable
  • Develop a method to represent programs in a semantic space. This process should be capable of learning from real-world data which may be unlabelled or have noisy and inaccurate labels
  • Use the semantic program space to develop a tool to help malware analysts
  • Publish in high quality academic conferences

Contact Details:

Supervisor Name: Niall McLaughlin                                               

Tel: +44 (0)28 9097 1830

QUB Address: ECIT, Queen’s Road, Belfast, BT3 9DT                         

Email: n.mclaughlin@qub.ac.uk