Current Projects‎ > ‎

MS and Semester Projects

Available Projects

All projects can be adapted to the scope of a Bachelor/Master semester project or a Master Thesis project. If you would like to do your semester project as a team, please let us know, and we can discuss the scope and distribution of the work.

Graphical User Interface for S2E

S2E is a platform for a full system software analysis that enables developers to explore all potential behavior of their program, by automatically identifying and exercising every possible execution path. The S2E platform was successfully applied by multiple research groups for a variety of tasks ranging from automatic device driver reverse engineering (RevEng) and testing, to multi-path performance profiling. S2E is currently configured prior to execution and has little interaction with its users during execution. S2E users have a hard time monitoring and controlling the evolution of their tests. The purpose of this project is to integrate S2E with existing development platforms, such as gdb and Eclipse. This project will bring an entirely new dimension to software debugging in Eclipse. Imagine performing queries on the debugged program, such as: “find all test scenarios for my program such that x < 15”, or “insert a breakpoint at each possible segmentation fault in my program”. Students who choose this project will gain insight in state-of-the art research (S2E obtained the best paper award at a top-tier conference in 2011), as well as experience in large system development (S2E is based on the QEMU virtual machine).

Prerequisites: C/C++ and Java programming, advanced debugging concepts.

Contact: Vitaly Chipounov, Volodymyr Kuznetsov

Web Dashboard for Managing Cloud9 Resources

Cloud9 is an automated software testing tool that parallelizes its analysis on many shared-nothing machines, thus tapping into the vast amount of computing resources available in today's clusters and cloud environments. While the testing process itself is automated, the work of setting up, monitoring, and extracting the results from the cluster is still a manual process, involving many scripts and low-level system administration.

The goal of this project is to write a unified web interface for controlling the launch of testing jobs (e.g., create testing experiments, set up priorities), the allocation of cluster nodes to jobs (add/remove nodes), and monitoring the performance of the cluster and its workload (show live updates of generated test cases, bugs found, etc.). The web application is aimed at cluster administrators or power users who use Cloud9 inside their organization.

Prerequisites: Web technologies, distributed application development.

Contact: Stefan Bucur

Linux Instrumentation Toolkit for S2E

S2E is a platform for a full system software analysis that enables developers to explore all potential behavior of their program, by automatically identifying and exercising every possible execution path. The S2E platform was successfully applied by multiple research groups for a variety of tasks ranging from automatic device driver reverse engineering (RevEng) and testing (DDT), to multi-path performance profiling. These tools work by looking at the guest OS’s kernel and driver state in order to analyze their behavior, detect bugs, etc. Unfortunately, using S2E to perform any analysis that requires deep introspection or modification of system state is hard as the introspection API provided by S2E is very basic and low-level. The aim of this project is to provide a high-level introspection API for Linux targets in S2E (potentially based on the insight-vmi tool). Such API would significantly simplify the use of S2E for Linux software analysis and enable new applications, such as testing of multi-threaded software, performing security analysis, etc.

Prerequisites: C/C++ programming, low-level system programming, OS architecture

Contacts: Volodymyr KuznetsovVitaly Chipounov

Automated Hardware-Software Co-Verification

Although hardware devices typically work in tandem with their own device drivers, the two components are often developed and tested in isolation, leaving integration testing for the very late stages of development. This allows for incompatibilities to creep in, and cause malfunctions, or even whole-system crashes. Even though automated testing and verification methods for both hard- and software have progressed significantly in recent years, integration testing is a mostly manual effort. The goal of this project is to extend our full-system analysis platform S2E and our device driver testing tool DDT, allowing them to symbolically execute hardware models described in a hardware description language such as SystemVerilog or SystemC. The resulting system will be able to systematically test a device driver in combination with a model of its hardware (hardware-software co-verification). Compared to the state of the art, such a system will have higher chances of correctly identifying unwanted behavior, while at the same time reducing the cost of integration testing.

Prerequisites: C/C++ programming, low-level systems design, hardware and logic design

Contacts: Volodymyr Kuznetsov, Johannes Kinder

Web Interface for Device Driver Testing

Device drivers are the plague of modern software. Consumers have no idea whether the driver that comes with the latest gadget they bought is trustworthy or not. Worse, there is no independent service that would allow them to thoroughly check a driver's reliability. Your goal will be to create such a service by interfacing with DDT, our automated testing tool for binary closed-source device drivers. On the website, consumers should be able to upload their drivers and get a test report summarizing all the bugs found as well as the steps to reproduce them. The challenges include automating the upload process, generating understandable test results, scaling to many clients, and coming up with metrics for grading a driver's reliability.

Prerequisites: Web technologies.

Contacts: Vitaly Chipounov, Volodymyr Kuznetsov.

Memory Leak Detection in Java or C#

Even though modern languages like Java or C# provide runtime environments on which dynamically allocated memory gets freed automatically as needed, thanks to garbage collection techniques, real-life applications developed for these environments can still suffer from memory leaks, which are more difficult to detect than in languages with direct memory management (like C). The goal of this project is to come up with techniques (supported by a prototype tool) for detecting and potentially avoiding memory leaks in such applications, during their normal execution by end users.

The work for this project will be conducted at ELCA in the Quartier de l'Innovation. Note that there may be several different techniques that can help achieve the project goals, or there may be only techniques that contribute to project goals without necessarily achieving them. The candidate for this project shall be able to think outside the box and propose smart alternatives to progress toward a solution.

Pre-requisites: Java, JVM memory model background, static analysis techniques; C may be useful.

Contact: Jean-François Poilprêt

Debugging Based on Execution Synthesis

Debugging is one of the hardest and most time-consuming activities in developing system software. Bug reports rarely provide much more than a coredump of the failed program, and the developer is required to turn into a detective to determine how the program arrived to the failure point. ESD is a tool for automating most of this detective work. Without requiring any program modifications or execution tracing, ESD finds a feasible execution of the program that exhibits the reported bug. 

The aim of this project is to extend ESD with the ability to efficiently synthesize long execution paths that reproduce complex concurrency bugs such as data races and atomicity violations. To achieve this, the project will combine dynamic program analysis with coredump analysis. 

Prerequisites: C/C++ programming, algorithms, basic operating systems and compilers knowledge.

Contact: Cristian Zamfir

Static Analysis of Self Modifying Code

Self modifying code is a popular method of malware authors to evade detection. It can be found in metamorphic viruses and shellcodes, but also in commercial protection engines. To date, no tool can deal with self modifying code without actually executing it (in an emulator, sandbox, ...). The goal of this project will be to extend our existing static analyzer Jakstab to reason about self modifying code. Jakstab explores the state space of binary programs by abstract interpretation, disassembling required instructions on the fly. Challenges will be to make Jakstab disassemble instructions directly from the abstract state space and to identify and support other possible requirements, such as a model of processor pipelines.

Prerequisites: Java programming, x86 assembly, basic program analysis

Contact: Johannes Kinder

Bug Analytics

How can we leverage vast amounts of bug data to improve our software systems? The objective of this project is to (semi)automatically research the data available in bug-tracking systems (bugzilla, etc.) and gather insights about how this information can be better used to improve reliability. One objective is to obtain a taxonomy of software bugs and their impact.  Another objective is to reproduce some of these bugs and evaluate DSLab's existing tools against them. Finally, the ultimate objective is to propose better ways to collect bug information and identify ways to more deeply understand this data in an automated way, using "bug analytics". This project will give you a deep understanding of which software bugs are most relevant in the real world, and you will get to hack on our existing tools and platforms.

An advanced version of this project would also look at the difference between "real-world code" vs. "lab code" and how these differences influence the effectiveness of automated testing tools and constraint solvers. The goal is to obtain insights to build better testing tools.

Prerequisites: C/C++ programming, scripting, basic operating systems knowledge

Contact: Baris Kasikci

Porting WaRR from Chrome (Desktop) to Chrome (Android)

WaRR is a high-fidelity record-and-replay tool for the interactions between users and web sites. WaRR records all the actions that users perform on a web page using the WaRR Recorder, and then replays the recorded interactions using the WaRR Replayer. The WaRR Recorder is a modified version of the Chromium web browser and adds logging functionalities at the Chrome layer where events on web pages are dispatched and processed. The WaRR Replayer uses an enhanced, developer-specific version of Chromium and Selenium/WebDriver.

Your job will be to implement WaRR for the Android version of Chromium. This will entail: 1) implementing the WaRR Recorder and the WaRR Replayer for the Android version of Chromium with low runtime overhead, so that user experience is not affected, and 2) ensuring that the interaction traces recorded with a version of WaRR can be replayed using the other version. That is, if one uses the desktop version of WaRR to record an interaction trace, then one can use the Android version of WaRR to replay the trace.

Note: it might turn out that there exist no differences between the desktop version and Android version of Chromium when it comes to recording and replaying user interactions. In this case, your job will be to update WaRR to use the latest version of Chromium.

Prerequisites: C++ and Java programming

Contact: Silviu Andrica

Ongoing Projects

Injecting Human Errors into JUnit Test Suites

Tests suites are used to determine whether a software system is fit for release. Therefore, erroneous test suites cause premature delivery of systems or prevent high-quality systems from being shipped. Your task is to inject human-realistic errors into test suites and assess their quality based on the the outcome of running that suite. For example, an error is to misspell, in a test case, the address where a test-client should connect to a server. If the test case, after being modified, still succeeds, this denotes an error in that test case.

Prerequisites: Java

Contact: Silviu Andrica

Parallel Symbolic Execution with S2E

S2E is our framework for multi-path in vivo analysis of binary software systems. Using S2E, we built DDT, a device driver tester that found many bugs in Microsoft-certified drivers, RevNIC, a reverse engineering tool that ports various closed-source NIC drivers to different platforms, as well as a performance analysis utility. S2E systematically tests large binary programs by selective symbolic execution: Relevant parts of a program are executed with symbolic instead of concrete values, allowing coverage unachievable by random testing.

However, S2E is currently still single-threaded. Your goal will be to parallelize S2E and to integrate it with Cloud9, our existing parallel symbolic execution engine that scales to large clusters. Your work will allow to unleash the full power of parallel symbolic execution, finding bugs in commercial software and ultimately making it more reliable.

Prerequisites: C/C++, parallel programming.

Contacts: Vitaly Chipounov

Running LLVM Binaries with S2E

Our symbolic execution framework S2E executes x86 binaries by translating them to LLVM bitcode on the fly, and running the resulting LLVM bitcode in the KLEE symbolic execution engine. However, the translation process is heavyweight and generates LLVM bitcode that is several times larger than what would a native LLVM compiler generate from source code (e.g., llvm-gcc). As a result, symbolic execution is unnecessarily slowed down. 

Your task will be to enable S2E to run applications compiled natively to LLVM bitcode, to speed up symbolic execution. The challenge is to create the illusion of executing a native application for the guest OS, and of communicating with a simple library to the LLVM application. This project will make you fluent in LLVM and x86 assembly, and you will acquire deep knowledge of binary translators, system emulators, and symbolic execution engines.

Prerequisites: C/C++, enjoying low level systems details, assembly and/or LLVM a plus.

Contact: Vitaly Chipounov

Past Projects

Turning X86 Binaries into LLVM

LLVM is an open source compiler infrastructure designed to perform efficient optimizations throughout the lifetime of the program (e.g., compile time, link time, runtime). We use LLVM for a number of tasks such as debugging, reverse engineering, and symbolic execution. For all these tools, we currently use QEMU to translate x86 instructions to LLVM, on the fly, as they execute on the CPU. Your task will be to port this translator to an offline conversion tool that takes an x86 binary and produces LLVM bitcode. You will then work on optimizing the resulting LLVM bitcode, so that it can run efficiently in automated testing tools that work with LLVM, such as S2E and KLEE.

Extend the POSIX Runtime for Cloud9

Thorough automatic software testing for large programs (think Eclipse, Office, Windows, etc.) is unfeasible using the currently available testing tools. We're addressing this issue with Cloud9, a testing tool that parallelizes symbolic execution on many nodes in a cloud environment. However, symbolic execution requires a "model" of the outside world with which the program interacts (OS kernel, I/O, etc.). This is an important aspect, since the completeness and the accuracy of the models determine the types of software that we can actually test with Cloud9. Thus, this project aims at extending the currently available UNIX / POSIX APIs in Cloud9, with features such as signals, symbolic file hierarchies or IPC. You will use the basic APIs offered by the engine to implement the UNIX / POSIX interfaces, based on the specifications found in the manpages, and using your operating systems knowledge.

An LLVM Target for QEMU

The goal of this project is to integrate our reverse engineering platform into the latest version of QEMU. Throughout the process, you will have a chance to learn and practice some of the most extreme forms of systems programming, and you will gain a thorough and precise understanding of how modern processors and hardware stacks work. We are using the QEMU virtual machine for automatically reverse engineering binary closed-source Windows device drivers. Our tool allowed us to effortlessly port several Windows network device drivers to many platforms, including embedded devices. We have implemented a translator that turns x86 code into LLVM representation in order to allow an easy analysis of the driver's code. However, the version of QEMU we use is rather old and relies on specific GCC versions, preventing many optimizations and interesting uses of the tool, like the ability of running it on many platforms and using modern hardware. This project will resolve these problems by adding an LLVM target to the latest versions of QEMU.

Less Intrusive and More Efficient Deadlock Immunity

The goal of the project is to reduce the intrusiveness of Dimmunix; we propose two complementary approaches. The first approach replaces wherever possible the existing deadlock avoidance mechanism employed by Dimmunix with a less intrusive one that consists of swapping lock operations previously involved in deadlocks. This deadlock avoidance mechanism is less intrusive, because it cannot cause new deadlocks, while the existing avoidance technique can. The second approach attempts to minimize the amount of instrumentation, as follows: First, Dimmunix statically finds critical sections that are deadlock-free with 100% certainty, e.g., critical sections that do not contain any other lock operations (i.e., are not nested). Then, Dimmunix instruments (with code for deadlock avoidance and detection) only the remaining critical sections. Our intuition is that most of the critical sections in real applications are not nested; therefore, we believe that this approach will greatly reduce Dimmunix's intrusiveness. We also believe that these two approaches will improve Dimmunix's efficiency considerably.

Self-Debugging Concurrency Bug Reports

One of the main challenges in debugging concurrency bugs, such as deadlocks and data races, is that bug reports rarely provide any meaningful information about the non-deterministic events and the thread interleavings that cause the bug to occur. Therefore, developers cannot productively use current bug reports to understand or reproduce a bug that occurred at the user's site. This project attacks a challenging research topic: designing next-generation bug reports that automatically collect (at the user's site, with negligible overhead) the necessary information to debug deadlocks and data races. Preferably, these bug reports can be fed into an automated system that helps developers even further.

Automatically Fixing Concurrency Bottlenecks

Concurrency performance bugs are difficult to fix for two reasons: First, profiling concurrent executions without interfering with the program's execution is a challenging research problem. Second, fixing such bugs requires expert developers who can precisely reason about the concurrency bottlenecks. This project aims to find a way to automatically provide fixes that eliminate concurrency bottlenecks identified in open source concurrent software by SynchPerf, our low-overhead synchronization profiler.