Linked list detection through memory accesses pattern analysis
DEI PhD Student
DEI - 3B Room
November 15th, 2011
One of the most interesting challenges in reverse engineering is the reconstruction of the data structures used by programs. The most promising techniques proposed work dynamically, by inspecting the memory accessed at runtime by the program under examination. These approaches identify simple data type such as integers, floats, and arrays by using a set of heuristics. Unfortunately, these approaches cannot detect more complex data structures and, as a side limitation, are difficult to generalize because they are based on rules and heuristics.
We hereby present a technique that can identify dynamic linked list and can be generalized because it is based on standard machine learning algorithms. We implemented our technique in a tool that first detects the opcode execution flow and the memory accesses at runtime and extracts relevant features (e.g., spline parameters and memory address).
Secondly, we train an SVM classifier on the extracted features. Third, using this classifier, we can tell whether the accessed memory cells at runtime are part of a list node. We validated our technique on our implementation using a testbed application that we developed to generate node structures and random high-level list operations. The classification results, obtained from experiments with 8 different node structures, show the effectiveness of the technique.
Our technique is useful in different domains beyond reverse engineering itself. For instance, security researchers often need to solve the issue of understanding what malicious binaries do and what data types they work on. In this domain, static analysis is inapplicable because binaries are often obfuscated; instead our tool can be used as it work at runtime, when the binary code is accessing the memory.
Artificial intelligence, robotics and computer vision