IDG Answers is a community of experts who are passionate about technology. Ask a question or answer one below.
Cloud computing aims to allow easy access to shared computing resources while freeing your organization from the capital outlays of owning its own IT infrastructure. But how do you know your cloud infrastructure will be resilient in the real world against rapidly evolving cyber attacks, a growing volume of users, and high-stress application load? You must ensure that your vital cloud-based services will can handle your unique business application traffic, peak user load, and security attacks. For the health of your business, you must harden resiliency by finding the optimal balance of performance, security, and stability of the cloud services that are crucial to your business operations.
Understanding how to find that balance requires precise insight into the performance of every element of your infrastructure under your own mix of conditions. As they try to gather that insight, some cloud vendors have relied on legacy tools deployed across hundreds of servers in cloud environments. The net result: an amalgamation of tools and workarounds that is hugely expensive yet brittle—and that cannot scale to address the task at hand.
Thanks and Regards,
There are many vendors offering products designed to measure your cloud's ability to run under catastrophic circumstances. But here's how IBM Developer describes the general process:
"The metrics around resiliency all relate to keeping your cloud running under adverse conditions. Denial-of-Service attacks, run away processes, failed-hardware resources are examples of Security, Isolation, and Resiliency respectively. A cloud should be able to quickly react to issues within the 'hard' or 'soft' aspects of environment by moving workloads to working areas of the cloud and quickly failing over to another virtual environments. A robust enterprise cloud should also support disaster recovery features, allowing your cloud to be linked to another cloud in an active/passive or active/active setup. One can imagine a performance experiment to measure Resiliency being similar to the Elasticity tests. However, instead of the cloud reacting to a breach in SLA, the cloud must now react to a system failure. For example, unplug the 'blade' running the Apache Day Trader workload, to simulate a hardware error. Measure how long it takes the cloud to react to this breach and return the response time back to 2 seconds or better. Similarly, your cloud must support isolation such that if one tenant’s virtual system is 'running amuck' another tenant will not be disturbed. To test this scenario, we create a run away process that continually allocates memory, or disks space. While this is happening we measure the performance of a second tenant to see if we notice any ill effects from its neighboring tenant. Also watch the system vital signs such that the run-away tenant is 'capped'. Your cloud must still perform while under a denial of service attack. Hence, another test involves setting up a denial of service attack by opening up Port 80 and sending bogus HTTP traffic. The cloud should employ an application firewall that filters Port 80 and looks for ill formed HTTP requests and deny them access to the Cloud’s network."