In complex sourcing transactions, all sides are quick on the draw with calculators at the ready when it comes to determining formulas for charges or caps on limitations for liability. However, another point of negotiation that similarly requires a sharp pencil is the calculation of “availability” for service levels – especially in a large environment. Drafters and business teams must pay attention to the relationship between a proposed SLA percentage and the definition that drives its calculation across a multiple system or device population.
For example, a service provider may agree that a particular system or device will have 99.9% availability. Said another way, that system will be operational 99.9% of the time, with 0.1% permitted downtime before SLA credits or any other negotiated remedies apply. Over a month, this means that the system will be available to the customer the entire month, less approximately 44 minutes of downtime (assuming for purposes of these calculations that a month has 43,800 minutes). That sounds reasonable and may make business sense for the customer – so long as this calculation applies to one system or device.
However, in a large multiple-device environment, those “three nines” of comfort may be diluted by a definition that multiplies the number of available minutes by the number of measured devices. Availability definitions commonly start off the calculation by multiplying the total number of minutes by the number of devices being measured. So, where there are 10 devices, the calculation starts with a total pool of approximately 438,000 minutes, so 99.9% availability could leave a customer facing a potential of 438 minutes of downtime (or 7.3 hours) over the course of a month.
Taking the example further, consider an environment of 100 devices – approximately 4.3 million minutes, so 99.9% availability could give the service provider 73 hours of allowable downtime while still being in full compliance with the service levels! In extreme examples of global deals with thousands of devices, it’s easy to see how an entire country or region could go down for entire days or weeks – all the while, SLA dashboards are showing green, with no credits or commitments to resume service.
Making matters worse, the “outage” definition commonly doesn’t get the same bountiful treatment. Outages typically count the minutes when the total system is down, without re-multiplying for all devices. In extreme cases, outages only count if they effect the whole system, so partial outages fail to register at all, even against the minutes those “down” devices contributed to the million-minute month.
To avoid giving a service provider this kind of unintended cushion on performance standards, customers should carefully review the language of the underlying availability calculation closely to prevent this sort of availability devaluation.
Examples of proactive ways to address this problem include the following approaches:
- Accept the language that permits a multiplier resulting in a larger pool of available minutes, but insist on a higher percentage rate – go beyond the “three nines” and negotiate for five or six “nines”;
- Calculate the available minutes based on regions or areas – availability would be determined by each region or area (g., by country), which would reduce the pool of available minutes; or
- Draft a compound service level that does not permit the multiplication of the pool of available minutes, along with a definition of “unavailable” that goes beyond they typical “number of minutes the entire system is down” and instead identifies elements or segments of the system that render the system “unavailable” if down.
Ultimately, availability methodology is not a one-size-fits-all provision – drafters and business teams must carefully consider the number of systems or devices in scope and their impact upon the availability percentage in order to provide a customer with meaningful and undiluted SLAs.