Indian banking ecosystem is on the verge of a major transformation. With the unforeseen scale of penetration of digital banking services, it is imperative that IT Operations (ITOps) leaders embrace the next wave of application performance monitoring – preventing outages even before they occur.
The downtime problem is real and an expensive one at that.
An hour’s tech outage is enough to stall 400000 UPI transactions according to NPCI data. In fact, the top 4 banks in India themselves experienced about 80 outages happening in 2020 alone, according to Down Detector.
This primarily stems from the fact that even though banks spend a considerable amount of their revenue on multiple IT monitoring tools, most of the tools are still not mature enough to use AI/ML before the fact prevention. They are only capable of troubleshooting when an outage does occur(at the expense of hiring SMEs/additional personnel to interpret the data and fix the problem). But with the onslaught of digital payments only going to grow exponentially, an outage even a minor one will cost money. Brand equity- rating agencies like Moody’s have also warned that recurring outages could lead to customers fleeing. Not to mention heavy rebukes from the RBI which oversees the entire industry’s performance.
Over the last few years, companies or banks in the US are proactively monitoring to avoid outages and these solution-centric approaches are also being picked up in India because of infra complexity and transaction growth. This enables them to stop constantly being in the firefighting mode, exhausting outage fires, and use that time and bandwidth to scale the infrastructure and gain a competitive edge.
Also Read: Accelerating Banking Digital Transformation
However, banks should and must exercise caution while making this shift. It’s better to inform yourself of what preventive AIOps is and then go about evaluating tools.
Simply put – to get to the holy grail of zero downtime
1) Banks should make sure their monitoring tools are collecting raw metrics and event data from multiple sources, multiple formats and ingesting them into the AI/ML engine.
2) Once the data is ingested the AI/Ml models adopted to crunch this data should be accurate and should factor in seasonality and workload trends to reduce false alerts and alert storms.
3) And finally the AIOps tool should be able to autonomously act on the insights and fix smaller issues before they exasperate.
While there are a fewAIOps tools in the market today, here are some factors to consider before investing or evaluating the existing ones –
1) The AIOps tool should be able to prevent incidents rather than react to incidents
2) The AI/ML models should have the capability to identify and remediate autonomously minor problems before they mature into outages
3) Should be able to identify potential choke points way ahead and optimise resources
4) Should be able to integrate and collect data from all silos seamlessly. More data equals to more accurate troubleshooting
5) Should be able to dynamically optimise workloads to handle transaction surges and seasonality
In conclusion, achieving a zero downtime experience is not a magic pill. There needs to be a fundamental shift in how banks think about monitoring – that it’s no longer OK to have dumb tools that cannot prevent outages and only aid with troubleshooting. Now is the time to explore the concept of preventive AIOps and then work with their SI partners to audit, replace or update their tool setup.
(The author, Uttam Dave, Vice President Platform Sales, HEAL Software Inc comes with 25+ years’ experience in business management and sales operations. His role is primarily to drive sales organisation and lead the sales team for India, ASEAN & Middle East Markets.)