Below are some ideas for projects in DSLAB, but we always welcome (and often prefer) students’ own ideas. Please contact George Candea for further details.

(Last updated in September 2016)

PhD Semester Projects

Python Introspection API for S2E

S2E enables developers to explore in parallel multiple potential behaviors of their target program, and it was used successfully in many projects including penetration testing, reverse engineering, driver testing, and multi-path performance profiling. What’s common across all these tools is that they inspect the guest OS’s and applications’ state. Alas, right now S2E does not have a great interface for deep introspection and/or tweaking of guest system state. This project is about providing a high-level introspection API for S2E (e.g., along the lines of the insight- vmi tool) that can be accessed from Python. Such an API would significantly simplify the use of S2E for “x-raying” applications and systems, in both a programmatic and interactive fashion.

Prerequisites: fluent C/C++, low-level systems programming, kernel development

CTF Assistant

In computer security, Capture-the-Flag (CTF) competitions require participants to conduct and react to the types of attacks that occur in the real world. This requires reverse engineering, network sniffing, protocol analysis, sysadmin skills, programming, and sometimes cryptanalysis. S2E can be a very powerful assistant for such competitions. This project is about providing an API on top of S2E that is accessible in a REPL and that enables CTF competitors to quickly answer typical questions about programs and OS kernels: Can there be a buffer overflow? What happens in a use-after-free scenario? Etc.

Prerequisites: fluent C/C++, low-level systems programming, CTF experience

X-Raying Applications with S2E

S2E is a VM-based platform, i.e, it requires setting up a VM with an entire software stack to analyze even a single program. While useful for testing applications directly in their native environment, it’s expensive. This project aims to enable “x-raying” applications in isolation. Besides running entire VMs, QEMU can also emulate isolated user-mode applications. (For example, it is possible to take a Linux binary compiled for the ARM architecture and run it on an x86 Linux host environment. Behind the scenes, QEMU automatically translates ARM to x86 and forwards all syscalls performed by the ARM binary to the x86 Linux host.) The goal of this project is to add symbolic execution support to QEMU’s user-mode emulation, in order to enable symbolic execution of unmodified binaries without the overhead of running the entire software stack. The main challenge of this project is how to invoke the system calls in a consistent manner without clobbering all execution states. Recall that in VM-based symbolic execution, one can fork the entire system state, which automatically guarantees that changes done to one state do not affect the others. Since the host environment is outside the control of user-mode S2E, forking is not an option. Students in this project will be solving important research problems, tinker with modern virtualization and OS memory handling, as well as gain experience with a large system (S2E components total over 1 MLOC).

Prerequisites: fluent C/C++, advanced debugging skills, kernel development, CPU architecture

Parallel Symbolic Execution with S2E

S2E is essentially a platform for multi-path in vivo analysis of binary software. In order to make S2E run faster, we wish to make it into a parallel program that can leverage all CPU cores at its disposal. This project is about parallelizing S2E in a manner similar to Cloud9.

Prerequisites: fluent C/C++, low-level systems programming, distributed systems, virtualization

Direct Analysis of LLVM Binaries in S2E

S2E analyzes x86 binaries by translating them to LLVM bitcode on-the-fly and then running the resulting LLVM bitcode in the KLEE symbolic execution engine. The translation process is heavyweight and generates LLVM bitcode that is much larger than what a native LLVM compiler would generate from source code (e.g., llvm-gcc); this slows down symbolic execution. This project is about enabling S2E to run applications compiled natively to LLVM bitcode. The challenge is to create the illusion of executing a native application for the guest OS, and of communicating with a library (that you would write) to the LLVM application. This project will make you fluent in LLVM and x86 assembly, and you will acquire deep knowledge of binary translators, system emulators, and symbolic execution engines.

Prerequisites: fluent C/C++, low-level systems programming, assembly and/or LLVM

Symbolic Fuzzing in S2E

S2E employs symbolic execution, a technique for automatically discovering possible execution paths through a program and identifying the input that trigger target conditions, like bugs. Symbolic execution is elegant and effective at finding corner cases, but it’s slow. The greatest “competitor” to symbolic execution is fuzzing, the approach of trying random inputs on a program; this is naive and non-systematic, but is fast and therefore can cover a lot of ground quickly. This project is about combining symbolic execution with fuzzing in S2E, in order to get the best of both worlds.

Prerequisites: fluent C/C++, experience with symbolic execution and fuzzing a plus

Embedded/IoT Software-Hardware Co-Verification

Although hardware devices (wireless routers, webcams, automobiles, etc.) typically work in tandem with their own device drivers and/or user-space software, the two components are often developed and tested in isolation. This allows for incompatibilities to creep in that lead to security vulnerabilities, crashes, and other misbehaviors. Even though automated testing and verification methods for both hardware and software have progressed significantly in recent years, integration testing is a mostly manual effort. The goal of this project is to extend S2E to symbolically execute hardware models described in a hardware description language such as SystemVerilog or SystemC. The resulting system will be able to systematically test a device driver in combination with a model of its hardware (hardware-software co-verification). Compared to the state of the art, such a system will have higher chances of correctly identifying unwanted behavior, while at the same time reducing the cost of integration testing.

Prerequisites: advanced C/C++, low-level systems design, hardware and logic design

Projects for MS/BS-level Semester Projects/Theses/Internships

The projects below can be adapted to the scope of a Bachelor/Master semester project or a Master Thesis project. If you would like to do your semester project as a team, we can discuss the scope and distribution of the work.

Translating x86 Binaries to LLVM

LLVM is an open source compiler infrastructure designed to perform efficient optimizations throughout the lifetime of the program (e.g., compile time, link time, run time). We use LLVM for a number of tasks such as debugging, reverse engineering, and symbolic execution. For all these tools, we currently use QEMU to translate x86 instructions to LLVM, on the fly, as they execute on the CPU. In this project, you will port this translator to an offline conversion tool that takes an x86 binary and produces LLVM bitcode. You will then work on optimizing the resulting LLVM bitcode, so that it can run efficiently in automated testing tools that work with LLVM, such as S2E and KLEE.

Prerequisites: fluent C/C++, low-level systems programming, (de)compilers, computer architecture

SoftFPU-based Simulation of Floating-Point Operations

S2E computes inputs that lead to conditions of interest by using an SMT constraint solver; as a result, it is limited to handling variables of the types that the SMT solver can handle. Most solvers can handle integer arithmetic only. However, floating point arithmetic is widely used in scientific programs, games, etc. This project is about working around this limitation by emulating the floating point unit (FPU) operations in software. Instead of executing a regular floating point instruction, your solution will invoke an emulated implementation that only operates on integers (i.e., on the coefficient and the exponent of the floating point representation). The emulated FPU together with the target program can then be executed symbolically, using a classic integer solver. There are several software FPU implementations available—you will have to choose an appropriate one and integrate it in S2E.

Prerequisites: fluent C/C++; assembly and/or LLVM a plus

Graphical User Interface for S2E

S2E is a platform that provides a variety of system analyses, some of which are intuitive and others that are not. Currently, S2E provides solely a command-line interface, and it has little interaction with the user during execution. This leaves it to the user’s imagination as to what exactly the analysis is doing, and so S2E users have a hard time monitoring and controlling the evolution of their analyses. The purpose of this project is to integrate S2E with existing development platforms, such as gdb and Eclipse. This project will bring an entirely new dimension to software debugging in Eclipse: Imagine performing queries on the debugged program, such as: “find all test scenarios for my program such that x < 15”, or “insert a breakpoint at each possible segmentation fault in my program”. Students who choose this project will work on state-of-the-art research problems and gain experience with a large system (S2E components total over 1 MLOC).

Prerequisites: fluent C/C++ and Java programming, advanced debugging skills

Docker-izing S2E

S2E is a platform with a wide range of uses, including penetration testing, reverse engineering, driver testing, and multi-path performance profiling. S2E is composed of many different parts: a core symbolic execution engine based on KLEE, an x86 CPU emulator augmented with symbolic execution support (QEMU), utilities for manipulating virtual disks, etc. At the moment, setting up and operating S2E is complex: one needs to manually install all system dependencies, build S2E, then manually create the VMs and invoke QEMU with the right flags. In this project, you will encapsulate the S2E workflow, from building to running a VM, behind a simple interface accessible through a single “front-end” command line utility. You will use the Docker container technology to create an isolated environment for S2E to run, with all dependencies pre-installed. The front-end utility will automatically pull the latest S2E code and build it inside the container. It will also offer basic commands for creating and bootstrapping new VMs, and launching them using the correct S2E workflow.

Prerequisites: operating systems, experience with Docker containers

Multi-Path Fault Injection

Programs contain error recovery code to deal with unexpected situations. For example, Java programs have try/catch blocks to handle errors that might occur when connecting to a server, C++ programs check the return value of system calls to deal with out-of-memory errors, and so on. Testing such error recovery code is challenging because errors usually happen rarely, and many of them are hard to trigger in a test environment. To address this, we developed LFI , a tool suite for library-level fault injection. LFI intercepts library calls in binary programs and injects errors according to user-specified fault scenarios, which allowed it to find many bugs in programs such as MySQL, Bind, Git, Pidgin, etc. The problem with LFI is that it is still a single-path analysis tool, and so cannot efficiently find corner case situations in target code, and it does not generate a concrete set of inputs to easily reproduce some types of bugs. The aim of this project is to implement LFI as an S2E plugin in order to solve the challenges faced by the single-path LFI.

Prerequisites: fluent C/C++, low-level systems programming

Certified Least Prefix Match Table

IP routing, a classic network function is based on LPM table. In order to enable Vigor approach, we need a certified implementation of this datastructure.

In this project, you will develop a formal specification of an LPM table, implement it in C, and prove that it correctly implements the specification. You can use a proof-assistant (such as CoQ) and have your code transformed with VST; or use an automated verifier (such as VeriFast); or propose your own tool/method.

Prerequisites: familiarity with a C-compatible program verifier, such as VeriFast, Frama-C/{Jessie,WP}, VST