Possible failures when you are already using mod_cluster and database redundancy
How can an application in a JBoss cluster with mod_cluster and database redundancy fail? Up to this point, we have done a considerable amount of work to make our application considerably more fault tolerant as well as increased its ability to continue on or at least recover gracefully from disaster. We have added redundancy by clustering web, JBoss EAP, and database layers as well as have session replication. We added physical tolerance by making sure these services are not just on separate virtual instance but at least on separate physical devices (which of course may be virtualized). We have our applications in JBoss EAP dynamically discovered and added the ability to dynamically add / remove application nodes from service through mod_cluster. We even have our application persisting its cache so it can restart from a known state. With all this done, what else could possibly fail? Plenty. Remember, disaster recovery is largely a business decision and is a balance between cost, probability of event, and affect on business. What if your network becomes saturated? What if your firewalls between web tier and your app tier become unstable? What if your application is compromised? What if you simply run out of disc? What if your datacenter loses power? There is no end to items that can be considered. The key is that we start off looking at a micro level (e.g. application server is unavailable) and address potential failure points up to a macro level (datacenter unavailable). As we ascend from micro to macro, the costs tend to rise as well as the probability for failure tends to decrease. Use this strategy – micro to macro assessments – coupled with cost and business risk to determine if your disaster recovery strategy is enough. No two companies are the same; no two approaches are the same.