Avoiding cross-application risk with enterprise-wide visibility
About
This content is brought to you by Evolven. Evolven Change Analytics is a unique AIOps solution that tracks and analyzes all actual changes carried out in the enterprise cloud environment. Evolven helps leading enterprises cut the number of incidents, slash troubleshoot time, and eliminate unauthorized changes. Learn more
An Intellyx BrainBlog for Evolven, by Jason English
Most of us have never had to think about enterprise digital risk, in all its forms.
Until recently, only a select council of IT executives needed observability into a broad range of potential software and hardware problems. They would all get together for a quarterly or monthly risk management review board meeting, either in person or on a very long, monotonous conference bridge.
The ‘telemetry’ on this call would consist of a CIO or a Chief Risk Officer, asking each department head, from cybersecurity to development, to IT operations, regional managers, and application suite owners: “What issues are getting reported to you? What are you seeing in your dashboard?”
In some ways, this occasional drudgery was pretty effective at mitigating risk because early change management and compliance procedures were slowing things down to a manageable pace. Then, we started introducing increasing levels of agile software updates and delivery automation atop cloud infrastructures.
Change happens so fast in today’s hybrid cloud environments, it’s much harder for enterprises to truly identify and control the risks that matter from the myriad of potential IT risks. Further, with so many stakeholders making changes that could impact upstream or downstream services, risk mitigation is quickly becoming part of everyone’s job.
Clarifying the opaque silo problem
Talk to a big 4 consultant about risk, and they will likely say that your enterprise “needs to break down informational silos to gain visibility into shared risk mitigation objectives.” Or some sort of win-win management-speak like that.
While cracking open silos always sounds nice in theory, what if they are still holding valuable products inside? It is important to remember that information silos were established for a reason. Different metrics matter to Ops, Dev, and Security teams because they have different objectives.
IT Ops teams will use their ITOM and ITSM platforms to track infrastructure performance indicators and resolve issues, whereas DevOps teams are carefully monitoring their CI/CD pipelines and deployments to manage faster, higher-quality releases. Security teams scan for threats and vulnerabilities in their SIEM and XDR workflows. Partners may have their own ways of measuring their service’s API performance against SLAs and SLOs in relation to your enterprise.
Even if you could gain an “X-Ray Vision” superpower and give everyone transparency into each other’s silos, each of these teams would likely not understand the context of the data they are looking at for their own workflows.
The modern SRE (site reliability engineer) role would be the closest to understanding the meaning of risk indicators within each group’s siloed dashboard, but even then, total transparency would only overwhelm them with unfiltered data when they are trying to identify risks.
Visibility requirements across different dimensions
Teams will never have total transparency into each other’s operational dimensions, but they will always collectively need to share visibility into what really matters: preventing and mitigating system-wide risks that will impact customers.
There’s no simple way to avoid risk in today’s fast-changing enterprise architectures, but there are requirements for improving visibility.
Observing complexity at scale. Any well-established enterprise that intends to survive has already embarked on modernization initiatives to improve its scalability and support new business services while maintaining operational integrity.
This leaves them supporting a mix of existing on-prem core systems and data stores, third-party service integrations, cloud data warehouses and thousands of server instances, Kubernetes clusters and serverless functions running in different clouds.
The new digital landscape is changing every few seconds. When something bad happens, finding the root cause is a lost cause. What is the key information needed for tracking down the problem, and going back to the time when it worked?
Hybrid IT telemetry. The information needed to identify hybrid IT application risk doesn’t reside within one silo. There are readily available platforms for service management, and cloud tools for monitoring and optimizing AWS instances that don’t talk to Azure or GCP instances.
The open source and vendor community are building excellent tools for Kubernetes observability, telemetry, and container orchestration, but by nature, those tools focus on specific clusters and workloads that run wherever they make the most sense across distributed infrastructures.
Predictive change intelligence. The DevOps movement brought agile software delivery and CI/CD automation to the forefront, and with it, codified approaches to configuring, testing, staging, and delivering new application code and IaC (infrastructure as code) components.
Despite all of this shift-left test automation and deployment goodness, how can we know what will happen when these changes actually move into production, and interact with the rest of a distributed application estate?
Establishing enterprise-wide visibility
Since heterogeneity will never go away in a hybrid IT environment, we need to gain enterprise-wide visibility into information that doesn’t just reside in one silo.
Cloud management, service management, release management, and analytics vendors often describe their highest-level global view as a “single pane of glass” (or, SPOG), providing a universal view into the many operations performed—within their own management dimensions.
In reality, there are many single panes of glass out there. How can we make sense of them?

Figure 1: Enterprise-wide multi-system risk management dashboard in Evolven.
Rather than dictating a particular IT asset or service management suite, Evolven has taken a non-opinionated approach to identifying enterprise-wide configuration risk, with agentless collection of near-real-time data from a distributed inventory of cloud and on-premises systems and services, and the platforms that deliver, monitor, and secure them.
Yes, it’s another SPOG, but think of it as a single view of the data that would be relevant to the modern version of that enterprise-wide risk control board, which now has more participants and stakeholders than ever.
Operationalizing risk management at a fintech
A major financial technology firm used Evolven to roll up a unified view across more than 250 thousand enterprise-wide systems, services, configuration, and usage data sources at hourly or up-to-the-minute intervals, depending on the rate of change.
What was really interesting is how they operationalized the use of the single view, automating some alerts with rules-based policies, while empowering IT leaders to ask triage questions of the systems within their own domains for every flagged change, such as:
- Is this change verifiably authorized or not?
- Is the change consistent with other similar changes?
- Is there anything anomalous about the change we are looking at?
- Does the change impact our standing for compliance or contractual agreements?
In this scenario, Evolven provided visibility into system-wide risk and helped the team surface the changes that presented the most risk. But success requires more than a single pane of glass. The firm’s disciplined team operationalized their risk review practices and escalation process to extract the most value from that visibility.
The firm created their own cloud data lake to accept event logs for things like change requests, expired certificates, and configuration drift, purpose-built for running their own risk modeling and projections. Alerts and incidents coming out of this process were then routed with a contextual data report to the appropriate database team or service management system.
A welcome side effect—or perhaps the best value for the teams responsible for compliance—was how well the system-wide snapshots and change reports documented the company’s system availability, data protection, and security postures. Auditors told them they were light years ahead of the competition as they passed compliance exercises with flying colors with little or no additional effort.
The Intellyx Take
Just giving teams a better reporting or monitoring tool for managing risk is never going to solve the risk visibility problem by itself.
Like any other digital transformation, achieving enterprise-wide risk awareness requires people, processes, and technology—in that order.
The best performing organizations always build up their own agreed upon taxonomy and procedures for getting data out of key systems, distributing it to the right people to manage risk, documenting or codifying the results for continuous improvement, and connecting that output to the systems of record each group uses.
©2023 Intellyx LLC. Intellyx is solely responsible for the content of this document, and no AI bots were used to write it. At the time of writing, Evolven is an Intellyx subscriber. Image source: TBA