Most companies have incident management processes in place to address everyday small-to-medium disruptions. These processes are typically based on proven customer service methodologies and/or standard IT service management practices, like ITIL. While generally effective for handling large volumes of low-impact incidents and service requests, these processes fall short when it comes to managing major incidents, which are a different category altogether. Major incidents require a unique and separate approach.
Impact and frequency
A standard incident usually affects only a few users, allowing for response and resolution times that are typically longer to help keep operational costs low. Major incidents, on the other hand, have significant repercussions for the business as a whole. Though thankfully rare, when major incidents occur, they can disrupt entire business units. In these situations, the financial impact of the incident far outweighs the cost of its resolution, making response speed and quality the key factors for success.
Skills and roles involved
In general, service desk personnel with limited training and technical expertise handle most incidents. Complex issues are escalated to second- or third-tier support teams with more specialized knowledge. However, the goal remains to resolve issues using the least technically skilled (and least costly) resources available. Major incidents call for a different strategy altogether. Here, the focus should be on engaging the individuals who can resolve the disruption the fastest, thus minimizing extended business impact. Typically, these resources are highly skilled (and correspondingly high-cost) subject matter experts.
Processes
Recent years have seen a shift in incident management processes toward self-service, automation and asynchronous support interactions (e.g. email-based interactions with global call center teams). This approach is designed to optimize the scalability of incident management processes while reducing human interaction. However, this emphasis on scalability often comes at the expense of time needed to resolve more complex disruptions. Major incident processes, therefore, must be optimized in the opposite direction, prioritizing solution effectiveness and speed of resolution over resource cost and automation.
Communication
In typical incident scenarios, management might perceive the need for communication as a failure. Major incidents are different in that active and broad communication with stakeholders is not only helpful for accurately assessing the impact but also essential for managing expectations and assuring stakeholders that the situation is under control. In many major incidents, the perception created by communication plays a more significant role in shaping the overall impact than the technical problem and its associated symptoms. Effective communication during a major incident needs to address four distinct groups of stakeholders:
- The affected user community whose activities are directly impacted by the incident
- Stakeholders who are either indirectly, or potentially affected, whose trust is crucial to managing the incident
- Internal teams and subject matter experts involved in diagnosing and resolving incidents (this may include vendors)
- Support and IT Management