The beginning of the New Year means making resolutions — and being involved in the IT Operations Management game for most of my career — I can’t think of a better one than to make your Event Management more effective. Why? Read my previous post Toward the new Roaring ‘20’s of Monitoring IT. Monitoring data is getting more voluminous, being delivered (and needing to be processed) more quickly, and coming from any number of sources. Keeping up with the ability to process that data load into actionable events is a tougher and tougher task. While this is unfolding, we are finally seeing monitoring technology adapt and improve to address today’s environments with innovations and next generation solutions. With that, an opportunity presents itself to make strong leaps in Event Management. But crafting the right menu of events to action is just part of the equation in improving Event Management today. You also need to move the whole process along faster. That means not only identifying those events, but Presenting the Event, Assigning the Event to the right team, and/or finding an automated method to Fix the Event. Ultimately, the goal is to identify real problems faster and restore service more quickly.
So there are a couple of important areas you can tackle to reach your resolution goal.
One is the technology you employ to help you deal with all those events and data. Older platforms such as Tivoli Netcool and others of that generation have all been around for a while (at least their core architecture and technology) and may be seeing their best days behind them as they try to adapt and keep pace. The advent of machine-learning capabilities incorporated into monitoring platforms has moved into the mainstream. Now, almost every vendor is touting their machine-learning/artificial intelligence capability. This situation seems reminiscent of the “predictive analytics” hype of ten years ago, as vendors raced to promote their algorithmic capability at predicting events. As with that era, I expect that there will be a shakeout of winners and losers here as the “in combat” experiences (proof-of-concepts and early adopter implementations) emerge and provide more clarity on what is actually providing value and delivering on its promise.
The other area to turn attention to is within the process chain and incorporate new ways of expediting identification and triage of events and then resolving the issue with a minimal amount of “touches”. This involves attention toward event management presentation, and tuning the platform to engage the right teams in real time, while also providing the capability to escalate, collaborate and remediate with more speed. In this realm there is opportunity to make inroads through attention to how events are managed and moved through an incident management cycle. Here again, there are evolutionary changes occurring as the walls between development and operational support are converging more and support/restoration/fix duties and teams are being redefined in the context of DevOps culture and the rapid pace and dynamism of digital delivery.
With technology and culture making important evolutionary change, it is a great time to reevaluate current practices and platforms and take some steps which will reduce the time it takes to isolate and restore services. And there may be no better time to make bigger leaps than had been possible just a few years ago.