Azure Well-Architected Part 4 Operational Excellence

Azure Well-Architected Part 4: Operational Excellence

The Azure Well-Architected Framework is a set of guidelines spanning five key pillars that can be used to optimise your workloads. In the previous blogs we covered Reliability, Security and Cost Optimisation alongside relevant services, processes and assessments. This time we’ll focus on the Operational Excellence pillar of the framework. 

Overview of Operational Excellence

The services and technologies you use in the cloud differ hugely compared to those on-premises. But, what doesn’t differ is the requirement that all deployments and environments are reliable and predictable. Operational excellence is the forth pillar of the Well-Architected framework that covers the operational processes you require to ensure applications continue to operate.

The key processes that fall within operational excellence are Workload Automation, Workload Release, Monitoring and Testing.  The end goal is to achieve superior operational practices.

Similar to the previous Security and Cost Optimisation pillars, Operational Excellence must be thought about throughout the lifecycle of a workload, including design and architecture phases, but especially once the workload is running. The management of a service and the related processes should not be retrofitted to environments or services, you must think about these areas early on as it will reduce management overhead in the long term.

A Well-Architected workload viewed through the lens of Operational Excellence is a workload this is released in an automated manner, monitored and tested in an efficient way to ensure the application provides value not just to your customers, but to your internal development and operations teams.

Specific to Operational Excellence, at a high-level you should be thinking about the following areas and processes:

  • Design, build and orchestrate workloads with DevOps principals in mind
  • Monitor workloads efficiently using Azure Monitor
  • Understand Application Performance Management
  • Automate as many processes as possible
  • Create and automate repeatable infrastructure
  • Prepare for the unexpected by testing workloads

Operational Excellence Principals

When designing for Operational Excellence in Azure, there are a set of principals covered in the Framework that you must think about, those principles include:

  • Optimise build and release processes by embracing software engineering disciplines. Infrastructure should be deployed via code (IaC) alongside Continuous integration and delivery pipelines that should be used for build and release (CI/CD). Automate testing plans and avoid any configuration drift using configuration as code. Azure DevOps and Azure Policy are two tools which can assist greatly in optimising build, release and configuration drift.
  • Understand operational health by using tools and processes that monitor all aspects of a workload including but not limited to build and release processes, infrastructure health and application health. Allow your teams to be proactive instead of reactive by observing workloads and correlating events to truly understand the workload health and performance.
  • Rehearse recovery and practice failure by running disaster recovery (DR) drills at regular intervals to validate and understand the effectiveness of your recovery processes, and the responsibilities of internal teams. Use chaos engineering practices to identify weak points in applications via services such as Azure Chaos Studio.
  • Embrace continuous operational improvement to reduce complexity and ambiguity where possible via continuously evaluating and refining operational processes and tasks. It’s important processes are always being evolved over time and that inefficiencies are optimised. Most importantly, always learn from your failures.
  • Use loosely coupled architectures such as microservices and serverless technologies that allow teams to build and deploy services independently to minimise service failures or impact on a large scale. It’s also important to think about cloud design patterns such as circuit breakers, load-levelling and throttling.

Operational Excellence Recommendations & Tips

Some of the best tips or recommendations for operational excellence are as follows:

Azure Policy

Azure policy is a free Azure service that allows you to enforce resource-level rules across your Azure estate that can assist in the adoption on operational best practices. Azure Policy is also a great tool for configuration drift management and monitoring. For example, Azure Policy can ensure all workloads adhere to a specific set of security rules such as HTTPS usage or TLS.

Azure Advisor

Azure Advisor is a fantastic resource that provides a set of Azure Policy recommendations that, in turn, can be used to identify opportunities to implement best practices across your workloads.

DevOps Checklist

Use the DevOps checklist to review your design and management from a DevOps Standpoint. The checklist covers culture, development, testing, release, monitoring and management. The checklist can be found here


Strangler Fig is a cloud design pattern that covers incrementally migrating a legacy system by gradually replacing specific pieces of functionality with new apps or services. Eventually, the older system is ‘strangled’ by the new system and eventually it takes over.

Team structure

Take time to understand and plan your operating model and internal teams. For example, managing loosely coupled architecture requires procedural decoupling as teams shouldn’t have to depend on partner teams to support, approve or operate their workloads.

Review your workloads

We will continue to cover the remaining pillars throughout this series of blogs. As highlighted on previous posts, you can review your current posture against the five well-architected pillars. The tool is free and can be accessed here.

For a more in-depth Architecture Review or a specific Operational Excellence Review feel free to reach out to our Azure Cloud Experts.

Find out more about Azure

Your competition doesn’t stand still and neither does cloud. Establishing and maintaining your cloud environment needs to be approached as a continuous cycle to remain competitive by taking advantage of the latest cloud capabilities. From assessment to design and build through to modernisation, we don’t believe in taking a ‘set and forget’ approach to your cloud.

We had the pleasure of being invited to Seattle to attend the first TSI Nonprofit Leaders’ Summit, alongside 90 Global Partners and 1400 Nonprofit Global Leaders and what an event

Despite SharePoint Online's robust capabilities as a communication platform, there are specific needs that require more tailored solutions to meet the nuanced demands of various organisations. We explore the need for mandatory read!

Skip to content