George Candea and Armando Fox
Proc. 9th Workshop on Hot Topics in Operating Systems (HotOS), Lihue, Hawaii, May 2003
[ PDF ]
Crash-only programs crash safely and recover quickly. There is only one way to stop such software – by crashing it – and only one way to bring it up – by initiating recovery. Crash-only systems are built from crash-only components, and the use of transparent component-level retries hides intra-system component crashes from end users. In this paper we advocate a crash-only design for Internet systems, showing that it can lead to more reliable code, easier failure prevention, and faster, more effective recovery. We present ideas on how to build such crash-only Internet services, taking successful techniques to their logical extreme.