Leveraging Semantic Signatures for Bug Search in Binary Programs

Jannik Pewny, Felix Schuster, Lukas Bernhard, Christian Rossow, Thorsten Holz

An­nual Com­pu­ter Se­cu­ri­ty Ap­p­li­ca­ti­ons Con­fe­rence (ACSAC), New Or­leans, USA, De­cem­ber 2014


Software vulnerabilities still constitute a high security risk and there is an ongoing race to patch known bugs. However, especially in closed-source software, there is no straight-forward way (in contrast to source code analysis) to find buggy code parts, even if the bug was publicly disclosed. To tackle this problem, we propose a method called Tree Edit Distance based Equational Matching (TEDEM) to automatically identify binary code regions that are “similar” to code regions containing a reference bug. We aim to find bugs both in the same binary as the reference bug and in completely unrelated binaries (even compiled for different operating systems). Our method even works on proprietary software systems, which lack source code and symbols.

The analysis task is split into two phases. In a preprocessing phase, we condense the semantics of a given binary executable by symbolic simplification to make our approach robust against syntactic changes across different binaries. Second, we use tree edit distances as a basic block-centric metric for code similarity. This allows us to find instances of the same bug in different binaries and even spotting its variants (a concept called vulnerability extrapolation). To demonstrate the practical feasibility of the proposed method, we implemented a prototype of TEDEM that can find real-world security bugs across binaries and even across OS boundaries, such as in MS Word and the popular messengers Pidgin (Linux) and Adium (Mac OS).


Tags: analysis, binary