I'd like to start this blog series with a discussion about balancing priorities because governance over your Azure tenants and subscriptions can be a tricky path to navigate.
Tricky navigation of priorities can often be seen in larger Azure customers where development and operational teams might operate across multiple geographies along with separate reporting lines.
The Audit vs Deny Debate
For example when we begin to think about how to apply governance in Azure there's often multiple teams of stakeholders with competing priorities:
- App Dev need to rapidly deploy, test, and ship their code securely
- Ops need to manage service requests, alerts, backups, and network security
- Managers need to control costs and budget, manage risk, and ensure SLAs are met
Every stakeholder above has valid underlying reasons for their priorities and so there's an essential need to strike a balance between an audit vs deny effect when it comes to how you apply governance in the Azure ecosystem.
Some examples of audit vs deny decisions you may face:
- A resource group is created without the minimum tags (e.g. costcentre) - audit or deny?
- A virtual machine is created without any backup configured - audit or deny?
- A storage account is created and allows firewall access from all networks - audit or deny?
- A subnet is created without any network security group association - audit or deny?
- A network security group inbound rule is created allowing RDP/3389 from any source - audit or deny?
- A network security group is applied to a gateway subnet - audit or deny?
- An external guest user is assigned owner-level access to your Azure subscription - audit or deny?
Another way to think of this debate can be to look at how you apply governance as either proactive or reactive measures.
An example of a proactive measure could be to deny resource group creation if the minimum tags are missing, this helps prevent a future remediation task to add the missing tags. A reactive measure for the same scenario would be to audit resource groups where the tags are missing and then remediate later.
Native Tooling for Cloud Governance
This brings me to Azure Policy which is a native tool that allows you to apply governance decisions by specifying policy effects such as:
It's also a tool which appears to grow in value, almost in lockstep, with your monthly Azure invoices. How cool is that?
Today, there's nearly 300 built-in policy definitions across 32 categories for you to start using. That's not a small number and demonstrates a significant effort by Microsoft to cover a wide range of use cases for Azure Policy.
Here’s some of my favourite Azure policy definitions:
- Automation account variables should be encrypted (GitHub)
- Azure Backup should be enabled for Virtual Machines (GitHub)
- Configure backup on VMs of a location to an existing central Vault in the same location (GitHub)
- Audit virtual machines without disaster recovery configured (GitHub)
- Allowed locations (GitHub)
- Key Vault objects should be recoverable (GitHub)
- Manage certificates that are within a specified number of days of expiration (GitHub)
- Gateway subnets should not be configured with a network security group (GitHub)
- RDP access from the Internet should be blocked (GitHub)
- Network interfaces should not have public IPs (GitHub)
- Add or replace a tag on resource groups (GitHub)
- Inherit a tag from the resource group if missing (GitHub)
Cloud Governance at Scale
When managing multiple Azure tenants and subscriptions there's an increasing need for governance at scale because you often need to do more (governance) with less (people).
Azure Policy provides that governance at scale by giving you the ability to:
- Define your governance rules and effects through policy definitions (.JSON format) which are either built-in or custom.
- Create your custom policy definitions at the management group scope.
- Assign your policies to management groups, subscriptions, and resource groups and, if you wish, exclude certain resources and resource groups.
- Specify your policy assignments using blueprints.
- Evaluate compliance of your policies and kick-off remediation tasks for non-compliant resources.
- Access Azure Policy via the Azure Portal, CLI, PowerShell, REST, SDKs, and ARM templates.
Having multiple access points into Azure Policy gives you more choice on deployment tooling when executing a policy as code workflow.
Governance at scale can be achieved by moving away from manually managing policies in the Azure Portal to having a repeatable process for policy authoring, testing, and deployment across multiple Azure tenants and subscriptions.
Currently the Microsoft pattern for a policy as code workflow provides high-level guidance on:
- Creating and updating policy definitions
- Creating and updating initiative definitions
- Testing and validating the updated definitions
- Enabling remediation tasks
- Updating to enforced assignments
- Processing integrated evaluations
Balancing priorities in the audit vs deny debate can be a comparison of reactive vs proactive measures and choosing which path to take on each.
Governance at scale is possible through adoption of a policy as code workflow and using an IaC/DevOps mindset to integrate repeatable processes for consumption of Azure Policy.
In Part 2 of this series I’ll deep dive into using the Azure Policy extension for Visual Studio Code and show you a real-world example of using it to author a custom policy definition by finding resource property aliases.