George Candea, Aaron Brown, Armando Fox, David Patterson
IEEE Computer, Vol. 37, No. 11, November 2004
[ PDF ]
Building systems to recover fast may be more productive than aiming for systems that never fail. Because recovery is not immune to failure either, the authors advocate multiple lines of defense in managing failures.