What turns into a very interesting dialogue is when you talk to them about how they collect those metrics, report those metrics and respond to those metrics. Here are the shortfalls of taking this approach:
- Up-time - this is very rarely measured from the End-users standpoint. So you are immediately putting IT on the defensive when you state the system was available on the network, but the end-user is not able to execute business on the system.
- This reported metric only gives credibility to how quickly IT personnel was able to find and fix the outages. Outages are typically caused by poor release practices or change management, IT functions, anyway.
This metric is the marrying of component availability to end-user availability. You can accomplish this by monitoring a systems network & server components for availability along with the end-users behavior. When an outage occurs at the component level, yet the service stays up to the end user, due to you superior availability design of the system, you have achieved avoiding a service outage.
Availability metrics then should be broken into the 5 following categories:
- Network (Link status, utilization, drop/error rates)
- Server (OS stats, CPU, HD, Mem)
- Application (DB, J2ee, .Net, etc)
- Business Logic (Code interfaces, Connectors, ETL, etc..)
- Business Process (Transactions, order counts, etc...)
- End-User (real-time screen to screen, refresh, errors, etc..)
Your next management report then will show something like this:
Email Services - Service Outage Avoidance: 25%
What this metric means is that we had an impact at a component level of 25%, but due to proper design and management we avoided having a business impact.
In other words, "You know how we weren't sure if it was worth it to build in all that fail-over and redundancy. Well here is how valuable that decision to spend was."
If you can equate the up-time value against this, you can calculate the ROI.
i.e. Up-time value of Email for 1mos= $1million dollars.
Cost of redundancy $1M
1 year ROI is 300%
(4mos *$1M = $4M return - $1M investment = $3M. $3M(return)/$1M(Investment) = 300%)
In my next blog - Don't underestimate the infrastructure