Tuesday, September 16, 2008

Service Outage Avoidance - The mother of all metrics

In my role at Vigilant as both a consultant and an executive I have had the opportunity to interview hundreds of operational IT managers and directors. In most cases the number one metric they were managed by was "Availability" or "System up-time".

What turns into a very interesting dialogue is when you talk to them about how they collect those metrics, report those metrics and respond to those metrics. Here are the shortfalls of taking this approach:
  • Up-time - this is very rarely measured from the End-users standpoint. So you are immediately putting IT on the defensive when you state the system was available on the network, but the end-user is not able to execute business on the system.
  • This reported metric only gives credibility to how quickly IT personnel was able to find and fix the outages. Outages are typically caused by poor release practices or change management, IT functions, anyway.
A new approach that should be considered is how I measured my operation as an IT director and what Vigilant consultants call "Service Outage Avoidance" (Not to be called SOA or real confusion sets in)
This metric is the marrying of component availability to end-user availability. You can accomplish this by monitoring a systems network & server components for availability along with the end-users behavior. When an outage occurs at the component level, yet the service stays up to the end user, due to you superior availability design of the system, you have achieved avoiding a service outage.
Availability metrics then should be broken into the 5 following categories:
  • Network      (Link status, utilization, drop/error rates)
  • Server         (OS stats, CPU, HD, Mem)
  • Application  (DB, J2ee, .Net, etc)
  • Business Logic    (Code interfaces, Connectors, ETL, etc..)
  • Business Process  (Transactions, order counts, etc...)
  • End-User      (real-time screen to screen, refresh, errors, etc..)
"Service Outage Avoidance" metric shows the percentage of downtime of a component where end-user was available.  (i.e.  4months of aggregate downtime of SAN on Email system during 12 months of end user availability)
Your next management report then will show something like this:
Email Services - Service Outage Avoidance: 25%
What this metric means is that we had an impact at a component level of 25%, but due to proper design and management we avoided having a business impact.
In other words, "You know how we weren't sure if it was worth it to build in all that fail-over and redundancy. Well here is how valuable that decision to spend was."

If you can equate the up-time value against this, you can calculate the ROI.
i.e. Up-time value of Email for 1mos= $1million dollars.
Cost of redundancy $1M
1 year ROI is 300%  
(4mos *$1M = $4M return - $1M investment = $3M. $3M(return)/$1M(Investment) = 300%)
                                                                       

In my next blog - Don't underestimate the infrastructure

Tuesday, August 5, 2008

Customers - Are they always right?

So you finally get a chance to look at your customer surveys. Disappointingly after all the coaching and training and process around your service desk, your customers are still complaining.

What's the deal? you say to yourself. "I thought if we put this Service Management process stuff in place customers would be happy, at least that is what Matt told me. Last time I listen to that idiot."

While not taking my advice is many times a good thing, it's not that the processes did not work, its that the implementation was not taking into the most important aspect of Service Management. It's Customer Service Management, not Process Service Management. That means that you have to take into consideration the customer's unique circumstances.

So while Customers are not always "Right", they can be made to feel like they are getting the service they are paying for, by listening to them and acknowledging their intelligence and frustration.

Too many groups are putting in Incident and Request models that are purely focused on the work flow, not the communication. The customers request or incident needs to be resolved, yes, but they also need to feel like they are getting attention individually. Let me share an example with you.

Recently I purchased a VHS to DVD copier. Upon making my 3rd or 4th copy successfully the device stopped reading the VHS tapes. It was snowy on the screen but with clear voice. I called tech support, where upon the technician following his trouble-shooting steps told me that I needed to plug in the "Yellow" jack from the back of the device to my television.

I explained that I was using the Component cables which were Red, Blue and Green and that my TV did not have a "Yellow" Jack to plug into. The technician insisted that the device could not work unless the "Yellow" jack was plugged in. Mind you I had told him several times that I had successfully made copies, and that nothing had changed with my physical connections. Needless to say a horrible experience, and I ended up returning a product that is probably fine because of someone not listening to the customer.

In a different call to my Internet provider Comcast, I had completely the opposite outcome. After troubleshooting why my connection was not working, I finally called Comcast. Now I know it's been a while since I've got my hands dirty with technology, but I still feel pretty comfortable troubleshooting network issues. Upon plugging my laptop directly into the Cable Modem, I realized that the problem was with their Cable Modem. So I called the Comcast tech support. I explained my situation and the steps I had taken. The Comcast rep told me "Can I put you on hold one moment, you clearly have taken some steps to isolate this and let me see if I can pick up where you left off." Literally within 2 minutes the router was up and running and my Internet was back up. He apologized for the inconvenience and then explained to me that they would add my device to their monitoring solution so that they would be notified again should this happen.

Now that is Customer Service.

Next on my hit list of topics: The mother of all Metrics "Service Outage Avoidance" - how this one set of metrics can be the key to your next raise.

Tuesday, July 8, 2008

Service Catalogs are the key to demonstrating Value

What is a Service Catalog? Simply put it is system or documentation that allows people to preview the services they can obtain from you and the expectation they can have of getting those services (time, cost, quality, etc...).
Do we need a Service Catalog? Do you need a resume to get a job? No, but if you want the right job, and want to get paid fairly for the abilities you can bring, and want to set the right expectation, then you will want to have a clearly articulated resume.
Same thing with the IT Service Catalog. If you want the business to appreciate the value IT brings to the organization, and you want to ensure that staff, suppliers, and costs are adequately budgeted for, then you must present to the business your capabilities. The Service Catalog is where you will publish and present what IT will do, and thus what they will not do. At face the business will not necessarily want IT to have this. If you do not currently have an IT Service Catalog, then currently the Business can ask you for whatever they want, and IT has to scramble to either try and justify why they can't do it, or figure it out. If there is no cost allocation in place for IT resources, then in the eye of the Business stake holder IT is a free resource, and we all know what the value of free - zero - free has no value.

Thus to really drive the value of IT services, IT must put in place a definitive "what we do, how we do it, and how much it costs" communication platform. More advanced organizations are using this information to build an on-line IT ordering site where people can order account setups, email boxes, new laptops, PDA's and Blackberries, and other enablement services. These sites will typically hang-off the Service Desk platform so that people can get services ordered without having to interact with a service request person. This can lead to tremendous cost savings and it also leaves the business in more control. So many organizations are finding the business more willing to fund the Service Catalog under the umbrella of Self-Service optimization and cost efficiency.

Next blog: "Is the customer always right?" I'll share some tech support stories to show the difference between a customer focused support person and a person who answers the phone and follows a script.