Azure Well-Architected Part 4: Operational Excellence

The Azure Well-Architected Framework is a set of guidelines spanning five key pillars that can be used to optimise your workloads. In the previous blogs we covered Reliability, Security and Cost Optimisation alongside relevant services, processes and assessments. This time we’ll focus on the Operational Excellence pillar of the framework.

Overview of Operational Excellence

The services and technologies you use in the cloud differ hugely compared to those on-premises. But, what doesn’t differ is the requirement that all deployments and environments are reliable and predictable. Operational excellence is the forth pillar of the Well-Architected framework that covers the operational processes you require to ensure applications continue to operate.

The key processes that fall within operational excellence are Workload Automation, Workload Release, Monitoring and Testing. The end goal is to achieve superior operational practices.

Similar to the previous Security and Cost Optimisation pillars, Operational Excellence must be thought about throughout the lifecycle of a workload, including design and architecture phases, but especially once the workload is running. The management of a service and the related processes should not be retrofitted to environments or services, you must think about these areas early on as it will reduce management overhead in the long term.

A Well-Architected workload viewed through the lens of Operational Excellence is a workload this is released in an automated manner, monitored and tested in an efficient way to ensure the application provides value not just to your customers, but to your internal development and operations teams.

Specific to Operational Excellence, at a high-level you should be thinking about the following areas and processes:

Design, build and orchestrate workloads with DevOps principals in mind
Monitor workloads efficiently using Azure Monitor
Understand Application Performance Management
Automate as many processes as possible
Create and automate repeatable infrastructure
Prepare for the unexpected by testing workloads

Operational Excellence Principals

When designing for Operational Excellence in Azure, there are a set of principals covered in the Framework that you must think about, those principles include:

Optimise build and release processes by embracing software engineering disciplines. Infrastructure should be deployed via code (IaC) alongside Continuous integration and delivery pipelines that should be used for build and release (CI/CD). Automate testing plans and avoid any configuration drift using configuration as code. Azure DevOps and Azure Policy are two tools which can assist greatly in optimising build, release and configuration drift.
Understand operational health by using tools and processes that monitor all aspects of a workload including but not limited to build and release processes, infrastructure health and application health. Allow your teams to be proactive instead of reactive by observing workloads and correlating events to truly understand the workload health and performance.
Rehearse recovery and practice failure by running disaster recovery (DR) drills at regular intervals to validate and understand the effectiveness of your recovery processes, and the responsibilities of internal teams. Use chaos engineering practices to identify weak points in applications via services such as Azure Chaos Studio.
Embrace continuous operational improvement to reduce complexity and ambiguity where possible via continuously evaluating and refining operational processes and tasks. It’s important processes are always being evolved over time and that inefficiencies are optimised. Most importantly, always learn from your failures.
Use loosely coupled architectures such as microservices and serverless technologies that allow teams to build and deploy services independently to minimise service failures or impact on a large scale. It’s also important to think about cloud design patterns such as circuit breakers, load-levelling and throttling.

Operational Excellence Recommendations & Tips

Some of the best tips or recommendations for operational excellence are as follows:

Azure Policy

Azure policy is a free Azure service that allows you to enforce resource-level rules across your Azure estate that can assist in the adoption on operational best practices. Azure Policy is also a great tool for configuration drift management and monitoring. For example, Azure Policy can ensure all workloads adhere to a specific set of security rules such as HTTPS usage or TLS.

Azure Advisor

Azure Advisor is a fantastic resource that provides a set of Azure Policy recommendations that, in turn, can be used to identify opportunities to implement best practices across your workloads.

DevOps Checklist

Use the DevOps checklist to review your design and management from a DevOps Standpoint. The checklist covers culture, development, testing, release, monitoring and management. The checklist can be found here

Strangler

Strangler Fig is a cloud design pattern that covers incrementally migrating a legacy system by gradually replacing specific pieces of functionality with new apps or services. Eventually, the older system is ‘strangled’ by the new system and eventually it takes over.

Team structure

Take time to understand and plan your operating model and internal teams. For example, managing loosely coupled architecture requires procedural decoupling as teams shouldn’t have to depend on partner teams to support, approve or operate their workloads.

Review your workloads

We will continue to cover the remaining pillars throughout this series of blogs. As highlighted on previous posts, you can review your current posture against the five well-architected pillars. The tool is free and can be accessed here.

For a more in-depth Architecture Review or a specific Operational Excellence Review feel free to reach out to our Azure Cloud Experts.

Find out more about Azure

Your competition doesn’t stand still and neither does cloud. Establishing and maintaining your cloud environment needs to be approached as a continuous cycle to remain competitive by taking advantage of the latest cloud capabilities. From assessment to design and build through to modernisation, we don’t believe in taking a ‘set and forget’ approach to your cloud.

Learn more

VMware Alternatives – How to Navigate the VMWare Landscape After Broadcom’s Acquisition: What’s New, Changes, Losses and Migration Strategies

•

July 18, 2024

The Broadcom acquisition of VMWare has created a lot of confusion and concern among VMware users with drastic changes to the VMWare portfolio. In this blog, we look at what this acquisition means for your current workloads and explore the VMWare alternatives.

How to Create a Sustainable and Engaged Community of Digital Champions

•

July 3, 2024

Having a strong and active community of digital champions is crucial in the digital transformation era. Digital champions are tech lovers who help and inspire others to use digital tools, enhancing peer-to-peer learning and supporting digital inclusion.

Customer login