Defining the New Frontier of AIOps for Retail

Tony RostResources

Tony Rost
By Tony Rost
Global Managing Director

Over the last eight months, I have facilitated a series of Retail CIO workshops to explore how large language models (LLMs) are reshaping retail IT workloads. While there is a natural hesitation around adopting new buzzwords, I believe “AIOps” accurately captures the breadth and ambition of the task at hand as CIOs navigate the complexities of their IT environments.

AIOps, or artificial intelligence for IT operations, represents a paradigm shift in the end-to-end management and optimization of retailers’ IT workloads. By leveraging AI and machine learning, retailers can automate and streamline key processes, quickdrive proactive issue resolution, improve performance, and deliver more innovative customer experiences. Let’s consider the three critical phases of AIOps for retail:

  • Observe. Leaders need an AIOps data strategy that goes beyond infrastructure monitoring to include the ingestion and analysis of application data, from POS logging data to transactions from integration buses.
  • Engage. Once organizations have tackled observability, they can move on to driving management and decision-making based on dashboards and data—and identify opportunities for automation and self-healing.
  • Act. Finally, retailers can execute automation use cases, from writing scripts with large language models (LLMs) to initiating auto-escalations based on predefined conditions.

Real-World Examples

Here are just a couple of examples of the creative ways retailers are already adopting AIOps:

Anomaly Detection

POS Anomaly Detection: Problem Detection, Self-Healing, and Intelligent Ticket Management

During one of our Retail CIO workshops, a leader shared how a mid-level technician developed a Python script that utilizes anomaly detection on POS data, with the help of a large language model (LLM) for coding guidance.

Whenever an anomaly exceeds a threshold, the script triggers a self-healing job restart. If the issue persists, a local LLM generates an ITSM ticket with relevant details and routes it through an auto-escalation path based on criticality. This showcases how AIOps enable proactive problem detection, self-healing capabilities, and intelligent ticket management.

Stock-Out Detector: Improving Inventory Management and Customer Experiences

Another CIO shared a notable success—the implementation of an ARIMA (Autoregressive Integrated Moving Average) model to detect stockouts by training it on inventory levels, historical sales data, and supplier lead times. The team automated reallocations from nearby stores and implemented other remediation techniques within predefined thresholds. The results included both improved inventory management and improved customer experience by reducing the impact of stockouts.

Observe: First and Hardest Phase

For AIOps, observability encompasses distributed tracing, meaningful context, and streaming logs. While most retail IT departments have a handle on infrastructure monitoring, they often lag in ingesting and analyzing application data such as POS logging data and transactions from integration buses. To power future AI automation initiatives, teams first need to complete log and event aggregation projects, since aggregating this data will serve as the training foundation for AI and ML algorithms.

When it comes to selecting an observability vendor, retailers have several options, including Dynatrace, New Relic, ELK stack, and AppDynamics. However, none of these solutions offer a particular moat around retail-specific needs.

As a best practice, I advise choosing the datastore first and then selecting the platform that aligns with it. This approach ensures that the IT workforce is well-prepared with the necessary skill sets. For example, ELK stack and AppDynamics leverage widely used open-source data stores like ElasticSearch and MySQL, respectively. Dynatrace and New Relic rely on lesser-known proprietary databases such as Grail and NRDB.

Engage: Instrument Flight Rules for IT

In aviation, instrument flight rules (IFR) govern flight under conditions where visual reference is inadequate. Similarly, the end goal of AIOps is to drive management and decision-making with dashboards and data. This data-driven approach is essential for discovering and implementing automation opportunities with ease.

As you progress in your observability journey, your teams will experience a shift in how they engage with the IT ecosystem. Proactive alerts triggered by predefined thresholds will replace reactive, late-night panic calls. Self-healing techniques will automate many L1 activities, reducing manual intervention and improving efficiency.

Ultimately, you want business users and IT teams to share dashboards, so you can create a business-first IT department. This new engagement style is the direct result of an investment in observability and marks a key milestone in your AIOps journey.

AI for Automation

Act: Automating with AI

With a robust observability platform in place and teams engaging with it daily, retailers can then embark on their automation journey. A strong observability foundation provides exactly the data and insights required to identify automation opportunities powered by AI and ML.

And implementing automation driven by AI and ML is probably less daunting than it seems. Increasingly, individuals of all skill levels can contribute, thanks to the availability of user-friendly tools and platforms that abstract away much of the complexity. These days, cloud providers offer a range of AI and ML services that can be integrated easily into existing workflows. And just as retailers have adopted public cloud solutions despite initial concerns about security and technology, retailers are better prepared to handle the integration and security challenges of AI and ML than they may think.

For those starting from scratch and aiming to make quick progress, leveraging large language models (LLMs) to assist in writing scripts is a great approach. These scripts can be triggered using existing schedulers and orchestrators like Jenkins, Circle, or Airflow, which are likely already present in your IT ecosystem. Additionally, you can execute scripts directly from APIs within SQL Server, Oracle Database, and SAP HANA, taking advantage of the tools you’re currently using.

As you advance in your AI journey, the next level involves utilizing serverless functions provided by cloud platforms such as Azure, GCP, and AWS. These serverless functions allow you to execute scripts and initiate auto-escalations based on predefined conditions. This approach enables you to scale your automation efforts without the need for extensive infrastructure management.

While AI deployment platforms like Azure AI Studio or AWS SageMaker offer powerful capabilities, they may be overkill for most retail IT departments at their current stage. It makes more sense for most retailers to leverage the tools and platforms already at their disposal, and gradually incorporate more advanced solutions as their AI maturity grows.

Getting Started

By taking a phased approach and focusing on the critical areas of Observe, Engage, and Act, retailers can harness the transformative potential of AIOps to drive efficiency, resilience, and innovation across their IT operations.

The AIOps journey begins with data—assessing your current observability maturity and identifying gaps in your log and event aggregation processes. These foundational projects are critical to future AI automation initiatives.

As you progress, you can then invest in upskilling your workforce to embrace AI and ML technologies, drawing from the lessons learned during the public cloud transitions.

Throughout the process, it is important to foster a culture of curiosity and ideation around ML and AI opportunities. One attendee of our Retail CIO workshop said he encourages his team to regularly ideate on ML and AI opportunities. He also celebrates AI successes in both team meetings and one-on-one interactions. By doing so, he is demonstrating that the right culture, combined with practical applications and executive support, can drive real improvements in retail operations, both within and beyond IT operations.

To learn more about how Logic can help you build a strong retail AIOps foundation, contact us today.

As the Global Managing Director of Logic’s Cloud practice, Tony helps our clients establish the robust cloud platforms they need to power the next-generation customer experiences. He brings more than two decades of experience in cloud and managed services. Tony has served as a vCTO/vCIO for several major companies, including Sony, Disney, and Warner Bros.

Previous: «   |   Next: »