Paper
Workshop and Seminars
Scalable, Accurate, Robust Binary Analysis with Transfer Learning

Scalable, Accurate, Robust Binary Analysis with Transfer Learning

Binary program analysis is a fundamental building block for a broad spectrum of security tasks. Essentially, binary analysis encapsulates a diverse set of tasks that aim to understand and analyze the behaviors/semantics of binary programs. Existing approaches often tackle each analysis task independently and heavily employ ad-hoc task-specific brittle heuristics. While recent ML-based approaches have shown some early promise, they too tend to learn spurious features and overfit to specific tasks without understanding the underlying program semantics.


In this talk, I will describe some of our recent projects that use transfer learning on both binary code and execution traces to learn binary program semantics and transfer the learned knowledge for different binary analysis tasks. Our key observation is that by designing pretraining tasks that can learn
binary code semantics, we can drastically boost the performance of binary analysis tasks. Our pretraining tasks are fully self-supervised -- they do not need expensive labeling effort and therefore can easily generalize across different architectures, operating systems, compilers, optimizations, and obfuscations. Extensive experiments show that our approach drastically improves the performance of popular tasks like binary disassembly, matching semantically similar binary functions, and recovering types from binary.