Manage Infrastructure at scale with SSM

Systems Manager, or SSM, is AWS' service to manage servers. This blog will cover how to configure your environment to use SSM and one of the tools.

SSM is a whole toolbox of services covering different aspects of server management. Over the years, that toolbox has expanded. Now, it's time to look at what SSM offers.

How SSM fits together

Overview

AWS breaks SSM into four main components:

Operations Management - Tools to manage incidents and issues.
Application Management - View and manage applications running on servers, plus Parameter Store
Change Management - Tools to manage the change process, plus Maintenance Windows and Automation.
Node Management - A slew of tools to directly manage servers.

This post covers the configuration needed to use the tools within Node Management.

Configuration

All the functionality of SSM does require an amount of configuration and, oft times, frustration. Generally, though, once you get it running, it stays running. But, if I had a dollar for every time I threw my hands up and asked, "Why isn't this working" I'd be holidaying on an island somewhere and not writing blogs. So, double-check everything if it doesn't seem to work straight off.

So, what do you need?

SSM Agent

Since much of SSM's functionality is based on things running inside the server, an agent is required. AWS has details on installing agents on Linux and Windows, but the best option is to use an AMI that includes the agent. The great thing is that most of your major operating system AMIs include this agent. This includes: Amazon Linux, Amazon Linux 2, SUSE 12 & 15, Ubuntu 16.04, 18.04 & 20.04, macOS and most Windows versions.

As SSM introduces new functionality, the agent will need to be updated. There is an easy way to do this using SSM, so I encourage everyone to do this. I'll be covering this in the Fleet Manager section.

Access to SSM Endpoints

Like many of the AWS services, SSM exists outside the VPC. This means the agent needs some way to get out of the VPC to talk to the SSM service.

You can allow your servers to have outbound internet access, but a more secure option is to use VPC Endpoints.

The minimal endpoints you need are:

SSM
SSMMessages
EC2Messages

I would also recommend configuring the KMS endpoint.

New functionality within Fleet Manager requires a Customer Managed Key created by the Key Management Service.

Permissions

So, you have your agent and access to the SSM endpoint; now, you need permission to use it! To be allowed access to the SSM services, IAM permissions are needed. AWS makes this easy with a managed policy to grant all permissions required by the instance. At the time of this writing (Aug 2022), that policy is AmazonSSMManagedInstanceCore. Add that to an EC2 role, attach the role to your instance, and you should be ready.

You can find full details in the AWS User Guide.

Documentation

Complete documentation for AWS Systems Manager is at: https://docs.aws.amazon.com/systems-manager/latest/userguide/.

You can find information on SSM Agent at: https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent.html.

Security

The best thing about using Systems Manager tools is the security aspect. Many of the SSM tools interact directly with the EC2 instance, but there is no need for inbound access. Furthermore, if you use VPC Endpoints, you don't even need outbound internet access.

In all the following examples, I have configured two EC2 instances with no inbound access and only a route to S3 and the local network.

EC2 Overview Instance Overview

EC2 Security Group

EC2 Routes Route Table

Run Command

Run Command was the SSM tool that I primarily used. As the name suggests, Run Command lets you run commands on one or more instances. While Fleet Manager and Session Manager focus more on single instances, Run Command shines when you have a script you want to run on multiple servers.

Run Command works by running a Command Document. AWS provides a plethora of pre-existing Documents for your use, or you can write your own.

One example I've used is migrating a customer's RedHat Linux servers from IPA for authentication to Microsoft AD authentication. That took several iterations to get the process and document correct. Once I had it all working on my test instance, I rolled it out to Dev, Test, PreProd and Prod using tags. That was over 100 servers in four easy steps (after I did all the testing).

So, how do you use it? First, select Run Command in the left-hand menu and click the Run command button.

That will load a page asking you to select the document you want to run.

Run Command doc select

After selecting the document, the next step is to choose your targets. If you only have a small number of servers, you can select them manually. For larger groups, using tags is a better option. If you input multiple tags, these add together. For example, setting tags with Key=App and Value=MyApp & Key=Env and Value=Dev would select all servers tagged with both.

Run Command targets

Finally, you can specify output options, e.g. sending to S3, CloudWatch and SNS.

Once a Run Command completes, you can go to the Command History tab to check the status. You can also Rerun the command (with the same parameters and servers) or Copy to new. The Copy to new repeats the command but allows you to alter parameters or select different servers.

Run Command rerun

For further information on how to use Run Commands, check out the AWS documentation.

Conclusion

The goal of this post was to provide what you need to configure SSM and get started with Run Commands. There is much more to SSM, including Fleet Manager and State Manager, which I covered in a Melbourne AWS User Group presentation.