Systems Seminar

EPFL IC Systems Seminar

Diagnosing and Fixing Concurrency Bugs


In the current multi-core era, concurrency bugs are a growing threat to software reliability. Many advanced techniques have been proposed for concurrency-bug finding. However, finding bugs is just a start, and software reliability does not improve until these bugs are actually fixed. Unfortunately, fixing concurrency bugs is not trivial, and developers are left to themselves to face the enormous pressure of fixing ever-so-many concurrency bugs. Despite all the efforts at developer sites, many concurrency bugs still slip into production runs and manifest at user sites as failures. Developers need to understand the reason behind a failure so that the same failure can be prevented from happening again.

In this talk, I will discuss my work on providing tool support for both bug fixing and failure diagnosis. I will first focus on automated concurrency-bug fixing, which builds upon the observation that concurrency bugs can be fixed by removing bad interleavings. I will present a prototype system, CFix, which assembles a set of bug detecting, synchronization enforcing, and testing techniques to automate the whole process of concurrency-bug fixing. Then I will discuss my work on production-run multi-threaded software failure diagnosis and describe the tools I built that have low run-time overhead and good capability in diagnosing failures caused by concurrency bugs.


Guoliang Jin is a Ph.D. candidate in the Department of Computer Sciences at the University of Wisconsin–Madison. His research areas are software reliability and software systems, with a focus on understanding, detecting, diagnosing, and fixing concurrency bugs and performance bugs. His work on automated concurrency-bug fixing received a SIGPLAN CACM nomination with the comment “this is one of the first papers to attack the problem of automated bug fixing.”