Crafting Test Matrices

In Part 1 I described the problem with how GitHub Actions Matrices work and their limitations. I also described a solution using JSON strings but I left with the question;

What if the JSON string was created on the fly? by a PowerShell script? 🤔

Now I'll go deeper into creating the JSON string and some of the interesting ways we can use this feature

So lets start with a fresh GitHub Action Workflow called dynamic-test-matrix.

name: dynamic-test-matrix

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  matrix:
    name: Generate test matrix
    runs-on: ubuntu-latest
    outputs:
      matrix-json: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - uses: actions/checkout@v2
      - id: set-matrix
        shell: pwsh
        # Use a small PowerShell script to generate the test matrix
        run: "& .github/workflows/create-test-matrix.ps1"

  run-matrix:
    needs: [matrix]
    strategy:
      fail-fast: false
      matrix:
        include: ${{ fromJson(needs.matrix.outputs.matrix-json) }}
    name: "${{ matrix.job_name }}"
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v2
      - name: Run Command
        shell: pwsh
        run: |
          Write-Host "Run '${{ matrix.command }}'"

Just like the example in Part 1, there are two jobs:

matrix : Runs a PowerShell script to create the test matrix
run-matrix : Pretends to the run the command for each item in the matrix

The matrices creation job

Let's look the matrix job:

    name: Generate test matrix

A nice friendly name in the UI

    runs-on: ubuntu-latest

The job will run on an Ubuntu based runner. Why Ubuntu instead of Windows? There is no reason why it has to be Windows specific so why not make it a cross platform script.

    outputs:
      matrix-json: ${{ steps.set-matrix.outputs.matrix }}

This instructs GitHub Actions that it will output a Job parameter called matrix-json, and that its value comes from the step output matrix, from the step called set-matrix.

Next comes the steps for this job

    steps:
      - uses: actions/checkout@v2

The steps need the PowerShell script to run, so we do need to checkout the project source code first

      - id: set-matrix
        shell: pwsh
        # Use a small PowerShell script to generate the test matrix
        run: "& .github/workflows/create-test-matrix.ps1"

The set-matrix step the runs the create-test-matrix.ps1 PowerShell Script (We'll get to the script soon). Note that the id is important here as this is name used in the job output above.

Using the matrices

Let's look the run-matrix job:

    needs: [matrix]

This job (run-matrix) needs the output of the matrix job, so this instructs GitHub Actions to wait until the matrix job completes.

    strategy:
      fail-fast: false

For the purposes of this blog post, I want all of the test cells to run even if other ones have failed. The GitHub Action Documentation has more examples of the matrix configuration

      matrix:
        include: ${{ fromJson(needs.matrix.outputs.matrix-json) }}

This is where the magic happens, where we take the output from matrix job, to configure the matrix for this job. This is similar to the "Sharing matrix information between jobs" post, where it deserialises the JSON string from the matrix job output parameter called matrix-json.

Note that I'm using include instead of cfg like the post above. This makes it easier to reference matrix items later in the steps. For example, instead of matrix.cfg.os we can just use matrix.os.

    name: "${{ matrix.job_name }}"
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v2
      - name: Run Command
        shell: pwsh
        run: |
          Write-Host "Run '${{ matrix.command }}'"

And then finally the steps for the job.

matrix.job_name is used to dynamically change the friendly name of the job
matrix.os is used to change which job runner is used
matrix.command is used to show what PowerShell command could be run

Note - This example won't actually run any PowerShell commands but you can make GitHub Actions run a PSake command, or PSBuild command, or run other PowerShell scripts in your project.

Creating matrices in PowerShell

Here's an example script that will output a two cell matrix: One cell for Windows and one for Ubuntu.

This would be saved as .github/workflows/create-test-matrix.ps1. If you change the name of this script, make sure you also change the GitHub Actions workflow to use the new name too.

$Jobs = @()

@('ubuntu-latest', 'windows-latest') | ForEach-Object {
  $Jobs += @{
    job_name = "Run $_ jobs"
    os = $_
    command = "$_ command"
  }
}

Write-Host "::set-output name=matrix::$($Jobs | ConvertTo-JSON -Compress))"

So what's going on here:

$Jobs = @()

We store the Job information in the $Jobs variable. Initially we have no jobs

@('ubuntu-latest', 'windows-latest') | ForEach-Object {

Instead of hardcoding, we can use loops and enumeration to create matrix cells

  $Jobs += @{
    job_name = "Run $_ jobs"
    os = $_
    command = "$_ command"
  }

To create a matrix cell we add a HashTable to the $Jobs array. Each key in the HashTable appears as a matrix variable in the GitHub Actions Workflow. In this example we are setting three keys; job_name, os and command. These are then used in the Workflow as matrix.job_name, matrix.os and matrix.command respectively. And each matrix cell does not has to have the same keys. It's completely up to you to what each matrix cell has.

Write-Host "::set-output name=matrix::$($Jobs | ConvertTo-JSON -Compress))"

And then lastly we use the set-output magic text and convert the Jobs into a JSON string. Note the use of -Compress here. GitHub Actions doesn't allow line breaks in the output so compress is used to create a JSON string on a single line.

When you run this script you get the following output:

::set-output name=matrix::[{"job_name":"Run ubuntu-latest jobs","command":"ubuntu-latest command","os":"ubuntu-latest"},{"job_name":"Run windows-latest jobs","command":"windows-latest command","os":"windows-latest"}])

Which, let's be honest is hard to read. Let's add a script parameter called Raw which will output the JSON in a readable way for humans

param(
  [Switch]$Raw,
)
$Jobs = @()

@('ubuntu-latest', 'windows-latest') | ForEach-Object {
  $Jobs += @{
    job_name = "Run $_ jobs"
    os = $_
    command = "$_ command"
  }
}

if ($Raw) {
  Write-Host ($Jobs | ConvertTo-JSON)
} else {
  # Output the result for consumption by GitHub Actions
  Write-Host "::set-output name=matrix::$($Jobs | ConvertTo-JSON -Compress))"
}

So now running .github/workflows/create-test-matrix.ps1 -Raw

[
  {
    "job_name": "Run ubuntu-latest jobs",
    "command": "ubuntu-latest command",
    "os": "ubuntu-latest"
  },
  {
    "job_name": "Run windows-latest jobs",
    "command": "windows-latest command",
    "os": "windows-latest"
  }
]

Seeing the GitHub Action in action

GitHub Action Workflow : Animated gif of the workflow running

Running the workflow we see first that the Generate test matrix job is running but there are not yet any subsequent jobs
Once the generation job is complete, two new jobs appear. These are the two jobs we specified in our PowerShell script: Run ubuntu-latest jobs and Run windows-latest jobs
When we look at the output of these commands we can see that the Ubuntu job has Write-Host "Run 'ubuntu-latest command' and the Windows job has Write-Host "Run 'windows-latest command', just like we specified in our PowerShell script

Back to the original problem ...

Back to Puppet Editor Services ... Now I could create a PowerShell script which created a JSON string with all of the test cases I needed (12 in total), I could easily make out what each cell matrix did, and I could easily add and remove test cases in the future.

GitHub Action Workflow : Puppet Editor Services output from main

All of the code for this is in Pull Request 288 of the Puppet Editors Services project.

Going further

Now that we are using PowerShell to generate the matrix, it opens up more opportunities:

Different tests for different people

You could add some tests if the person who raised the Pull Request had the name 'glennsarti'

if ($ENV:GITHUB_ACTOR -eq 'glennsarti') {
  $Jobs += @{
    # ...
  }
}

Note - See the documentation for the full list of GitHub Action Environment Variables

Different tests based on the changed files

What if we could detect which files were being changed in a Pull Request and then change the testing. For example:

Let's say we had a PowerShell module which included documentation. If a Pull Request was ONLY changing the documentation files then there'd be no need to run PowerShell script tests. And vice versa.

Fortunately git can help us here. We can use git diff --name-only to list all of the files that are affected.

param(
  [Switch]$Raw,
  [String]$FromRef
)
$Jobs = @()

$TestModule = $false
$TestDocs = $false

if (![String]::IsNullOrWhiteSpace($FromRef)) {
  (& git diff --name-only $FromRef...HEAD) | ForEach-Object {
    if ($_ -like 'src/*') { $TestModule = $true }
    if ($_ -like 'docs/*') { $TestDocs = $true }
  }
}

# Make sure we test something
if (!$TestModule -and !$TestDocs) {
  $TestModule = $true
  $TestDocs = $true
}

@('ubuntu-latest', 'windows-latest') | ForEach-Object {
  if ($TestModule) {
    $Jobs += @{
      job_name = "Test PowerShell Module - $_"
      os = $_
      command = "psake test-powershell"
    }
  }

  if ($TestDocs) {
    $Jobs += @{
      job_name = "Test Documentation - $_"
      os = $_
      command = "psake test-documentation"
    }
  }
}

if ($Raw) {
  Write-Host ($Jobs | ConvertTo-JSON)
} else {
  # Output the result for consumption by GitHub Actions
  Write-Host "::set-output name=matrix::$($Jobs | ConvertTo-JSON -Compress))"
}

Let's go through this script in more detail;

  [String]$FromRef
)
$Jobs = @()

$TestModule = $false
$TestDocs = $false

We add a new parameter called FromRef which specifies where in the git history we compare from. Typically this is the branch the Pull Request is targeted against. We also add two flag variables TestModule and TestDocs which we'll use to track whether we should test the Module and Documentation.


if (![String]::IsNullOrWhiteSpace($FromRef)) {
  (& git diff --name-only $FromRef...HEAD) | ForEach-Object {
    if ($_ -like 'src/*') { $TestModule = $true }
    if ($_ -like 'docs/*') { $TestDocs = $true }
  }
}

If the FromRef is set then we run the git diff command.

--name-only means it only returns the filename instead of the full diff for each file
$FromRef...HEAD means to compare from $FromRef, to the current commit (This is known as HEAD)

Then for each file that has been changed we test if it's a PowerShell Module file (src/*) or a documentation file (docs/*) and set the appropriate flag (TestModule or TestDocs)

# Make sure we test something
if (!$TestModule -and !$TestDocs) {
  $TestModule = $true
  $TestDocs = $true
}

It is possible that a Pull Request doesn't change either the Module or Documentation so test everything just in case.

  if ($TestModule) {
    $Jobs += @{
      job_name = "Test PowerShell Module - $_"
      os = $_
      command = "psake test-powershell"
    }
  }

  if ($TestDocs) {
    $Jobs += @{
      job_name = "Test Documentation - $_"
      os = $_
      command = "psake test-documentation"
    }
  }

And now only add the PowerShell Module and Documentation testing if the appropriate flag is set.

The last change is to the GitHub Actions Workflow.

Previously we called the PowerShell script using

        run: "& .github/workflows/create-test-matrix.ps1"

and instead now we can pass through the -FromRef argument

        run: "& .github/workflows/create-test-matrix.ps1 -FromRef '${{ github.base_ref }}'"

The github.base_ref variable comes the GitHub Actions context syntax

The base_ref or target branch of the pull request in a workflow run. This property is only available when the event that triggers a workflow run is a pull_request.

Context Documentation

Wrapping Up

I migrated from Travis and AppVeyor CI using a custom PowerShell matrix creation script which now gives me more power to cater for different testing scenarios. And we also saw what else you could achieve with this technique; using it to change testing based on who made the change, or changing the testing requirements based on what was changed.

Resources

You can find the original version of this post on Glenn's personal blog.