Tuesday, June 17, 2008
For example, if a person at a fast food restaurant was to get measured on the quality of their hamburg, that is good thought, but what does it mean? It's not like they can change the type of beef or bread or other things like that. So rather than just say improve the quality, the manager needs to put in measures that the employee can affect. Time on the shelf less than 10 minutes, bread no older than 5 days, etc...
Those are factors that the employee can watch and adjust, ultimately improving the SLA. Which brings me to the second factor. It has to be measurable. If you can not measure it you can not manage it. If you can not time the hamburger on the table, the age of the bread, you can not determine it's indication of quality.
If you look at the standard SLA's in place that are not on the nightmare side of the house, they are things like 99.999% up-time. If you asked most IT folks what that meant you would get different answers. Some would say the server 99.999% of the time up over the course of a year. Others might say 99.999% would mean that application services are available to all users for no less than 15 minutes in the course of a year.
1 is very measurable, but not of high-value. The other is extremely valuable but difficult to measure.
So when establishing your agreements to the level of service required it is important to determine what you can do and what the business needs. Then negotiate the middle ground. The more the business needs, then the more IT will need to deliver and the higher the cost. Over promising on an SLA that the IT department can not hit does not help anyone. So it is crucial for IT to establish what their capabilities look like. My next blog will be what a Service Catalog is and why it is needed to have true SLA management.
Monday, June 9, 2008
I was too clumsy to be a mechanic, so my dad fired me and forced me into Computers. However, I didn’t forget what I had learned about troubleshooting.
First, troubleshooting is not something you are born with. It is a skill that is harnessed based on 3 common factors:
1) What you know
2) What you don’t know
3) What you are learning
When you piece these 3 factors together you create the framework for discovery. By adding at negative and positive approach will then lead you down a path of what good troubleshooters simply call the process of elimination.
Do you know what is working? Do you know what is not working?
What don’t you know is working? What don’t you know is failing?
What have I proved with this step? What have I disproved with this step?
So when it comes to troubleshooting complex systems, the same principle applies. You just need to analyze them in layers. Here are the layers that VIGILANT has documented as the logical points to eliminate.
Infrastructure: Hardware, Networking, Operating Systems
Application: 3rd party application services
System Interfaces: Connectivity between dependant systems
Business logic: Business rules that cause transactions to operate differently
Business Process: The way the end-user is executing the transaction
Business Service: Dependency on data or other elements for success
For really complex issues, take each of these tiers and apply the 3 principles of discovery to them and you fill find the problem is not as much as a black-hole as you thought it was.
Tuesday, June 3, 2008
How can you keep this from happening? A better test plan is the place to start.
- Review the types of activities that the users will be performing. We call these transactions.
- Review the location and amount of users. Take into consideration network speeds.
- Review the amount transactions that will be performed.
Many IT performance testers simply look at user count and business transactions. Failing to understand the network conditions the volume of transactions will produce an inadequate simulation.
The better the simulation - the more valuable the predicted operation of the system when it goes live.