George Candea, Emre Kiciman, Steve Zhang, Pedram Keyani, Armando Fox
Proc. 5th Intl. Workshop on Active Middleware Services (AMS), Seattle, WA, June 2003
[ PDF ]
This paper demonstrates that the dependability of generic, evolving J2EE applications can be enhanced through a combination of a few recovery-oriented techniques. Our goal is to reduce downtime by automatically and efficiently recovering from a broad class of transient software failures without having to modify applications. We describe here the integration of three new techniques into JBoss, an open-source J2EE application server. The resulting system is JAGR – JBoss with Application-Generic Recovery – a self-recovering execution platform.
JAGR combines application-generic failure-path inference (AFPI), path-based failure detection, and microreboots. AFPI uses controlled fault injection and observation to infer paths that faults follow through a J2EE application. Path-based failure detection uses tagging of client requests and statistical analysis to identify anomalous component behavior. Microreboots are fast reboots we perform at the sub-application level to recover components from transient failures; by selectively rebooting only those components that are necessary to repair the failure, we reduce recovery time. These techniques are designed to be autonomous and application-generic, making them well-suited to the rapidly changing software of Internet services.