Continuous Integration and Continuous Delivery in Azure Data Factory V2 using Powershell: Part 3

Overview

In this 3 part series of blog posts, I will be describing an approach I have used recently to implement Continuous Integration and Continuous Delivery in Azure Data Factory V2 using Powershell module.

In this final part of the series, we will be focusing on

Creating a build pipeline using YAML that will package the artefacts for continuous integration whenever the code changes. Currently, there are no unit & integration testing options for Azure Data Factory V2. You can find more information here.
Creating a release pipeline using a Powershell module called azure.datafactory.tools. To view the source code, refer here.

Prerequisites

Go through Part 1 and Part 2 of the blog series to get much needed context. Part 1 describes what CI/CD mean in Azure Data Factory V2 and its process workflow. It also explains how to provision Azure resources for environment setup using Azure CLI. Part 2 describes how to create a basic Azure Data Factory (ADF) V2 pipeline to be used as a sample for testing CI/CD pipeline.
Azure environment already set up and ready to go. If not present, you can follow along by going through "Azure Resources setup using Azure CLI" section in Part 1 of the blog post series.
Data Factory pipeline created in a feature branch on DEV ADF environment (DF-Ash-Dev, in our case) and validated. If not present, you can follow along by going through "Setup Azure Data Factory V2 pipeline" section in Part 2 of the blog post series.

So far, we have completed the following steps

Step 1: Azure Resources setup using Azure CLI

Go through Part 1 of the blog series to understand how to provision Azure resources for environment setup using Azure CLI.

Step 2: Setup Azure Data Factory v2 pipeline

Go through Part 2 of the blog series to understand how to create an ADF V2 pipeline which will be used as a sample for testing CI/CD pipeline in the following steps. Let's get started with Step 3.

Step 3: Setup build pipeline using YAML

If you would prefer to copy the YAML file to setup build pipeline, you can access that at https://github.com/ashisharora1909/ADF-CICD-Demo under "3-BuildPipeline" folder. Alternatively, if you want to follow along, please go through the below steps.

Login to https://dev.azure.com to open Azure DevOps. You will be presented with a screen which looks like below. It lists the organization under the current Azure Active Directory (AAD) selected along with the list of projects present under the current organization.

Navigate to the project where you have hosted the source code repository of Azure Data Factory V2 objects which is ADF-CICD-Demo, in my case. To create a new pipeline under a project, Select Pipelines -> Create Pipeline

Select Azure Repos Git as our code resides in Azure Repos.

Select the repo where the ADF v2 objects are hosted. In my case, it is ADF-CICD-Demo

If you have already copied the YAML pipeline from the github link above and committed in the repo, then select Existing Azure Pipelines YAML file and select the file. As I will be creating the pipeline from scratch, I am selecting Starter Pipeline.

Copy the following code in the azure-pipelines.yml file and click Save. This will prompt you to commit the file to the repository. Select “commit directly to the master branch” option to commit this file. The commit message is already populated with a predefined message. Alternatively, you can write a custom commit message.

Once the pipeline is created, the pipeline name is automatically configured as ADF-CICD-Demo. I have renamed it to “AzureDataFactory-CI” to put a more meaningful name as I will be creating a release pipeline after this in the same project and want to identify it easily.

Let’s go through the YAML file to understand what it is doing.

Setting the trigger property to master branch. This means that the pipeline will be triggered every time there is a commit to the master branch. This is the default behaviour and I will not be changing that as I want it to trigger whenever there is a commit to the master branch via pull request.
The pipeline will be running on ubuntu latest image.
Below are the two steps configured in the pipeline

a. Copy files – Copy the entire contents of the repo ADF-CICD-Demo. “**” depicts the root folder.
b. Publish – Publish the contents copied in the Copy files step to the Build.ArtifactStagingDirectoy and name the artifact as “drop”. Build.ArtifactStagingDirectory is a system variable pointing to a predefined staging directory which can be accessed through other pipelines.

Testing the Build pipeline

In order to test the build pipeline, let’s first push the changes from our feature branch to the master branch via pull request.

This will automatically trigger the pipeline as there is a commit made to the master branch via pull request. On reviewing the Pipelines, you should see that the AzureDataFactory-CI pipeline is triggered as a result of merging the Pull request and completed successfully.

On reviewing the run, you should see something like below which shows that an artefact is published and clicking that, we should be able to the files present in that.

The completes setting up our build pipeline section. In the next section, we will be creating a release pipeline to deploy this on to Production.

Step 4: Setup release pipeline

Login to https://dev.azure.com to open Azure DevOps and navigate to the project where you have hosted the source code repository of Azure Data Factory V2 which is ADF-CICD-Demo, in my case. To create a new release pipeline under a project, Select Releases -> New Pipeline

Start with an Empty job.

This will open a pop-up window to configure a stage. Let’s name the first stage as Non-Production as we want to first deploy the changes to Data Factory mode in Dev ADF v2. (Remember, the changes are still in the Git mode in master branch). Once done, close the Stage pane.

Next step is to add an artifact which we want to release through this pipeline. This should be the artifact generated by our build pipeline (AzureDataFactory-CI). Follow the instructions in the below screen to configure it.

At this stage, the pipeline should look something like below. Click on the link highlighted to go to the Tasks tab. Alternatively, you can click on Tasks tab above to navigate to it.

Click Save to save the pipeline and then rename it to “AzureDataFactory-CD”. To continue with adding tasks, go to the pipeline and click Edit to edit the pipeline.

On the Tasks tab, click “+” icon next to the Agent Job to add a task to the agent job. This should open a new pane to add a task from a list of options available natively in Azure Pipelines along with others in the Marketplace. We will be selecting “Azure Key Vault” to download Azure Key Vault secrets.

Configure the Azure Key Vault task as shown below. As this is the Non-Production stage, we are using the Dev Key Vault in this task. We will create the same step in the Production stage, pointing to the Prod Key Vault i.e. KV-Ash-Prod.

Please note that we have already created a service connection under Project Settings in Azure DevOps for both Dev and Prod. These service connections are authenticated with the Azure Service Principal that we created in Part 1 of the blog series for both Dev and Prod. These service principals have been given permissions to access secrets in the respective key vaults. To go through how to create a service connection, refer the section “Configuring Service Connection in Azure DevOps” below.

The next step is to add another task to run PowerShell module to deploy the artifact to Non-Production environment. To do this, click on “+” icon and then select “Azure PowerShell” in the pane and click Add.

Configure the Azure PowerShell task as shown below:

The inline script used can be copied from below:

Let’s go through what the PowerShell script is doing:

First, we are installing and importing azure.datafactory.tools module which is created to simplify ADF CI/CD processes. For more information, go through the link provided in the overview section.
Once the module is installed, we are calling a method “Publish-AdfV2FromJson” which publish all the Azure Data Factory service code from JSON files created through the ADFv2 UI.

We have now configured both the tasks required for Non Production stage i.e. Download secrets from Azure Key Vault and running method from azure.datafactory.tools module to deploy to Non-Production stage. We have to do the same for Production stage. A quick way to do this is to clone the Non-Production stage and then update the values to configure it for Production stage. Click on the highlight button in the below screen to clone the stage.

The values we need to update in the cloned stage are:

Change the stage name from “Copy of Non-Production” to “Production”
In the Azure Key Vault task, update Azure subscription to select “SPN-Ash-Prod” which corresponds to the service connection created with Azure Service Principal having permissions to access secrets from Prod Azure Key Vault. Also, update Key vault to select “KV-Ash-Prod”.
In the Azure PowerShell task, update Azure subscription to select “SPN-Ash-Prod” as in the previous step.

After making these changes, the pipeline will look something like below:

Let’s go on to the Variables tab and configure user-defined variables used in the inline script. We will be creating a pair for each variable, one with the scope as Non-Production and the other as Production and enter corresponding values for the specific environment. After configuring, the Variables tab should look like below:

As the final step in the release pipeline, let’s configure a continuous deployment trigger on Artifacts and pre-deployment conditions on the Production stage. Continuous deployment trigger enables automated creation of a release every time a new build of master branch is available from the build pipeline. On the pre-deployment conditions, we will be setting up Pre-deployment approvals to have a manual intervention so that we can first review changes in Dev ADF v2 before pushing it to Prod ADF v2. To configure this, go to the Pipeline tab and configure as shown below. Once done, click Save to save all the changes made in the pipeline.

There is one more thing that we need to do i.e. Replacing all properties in Azure Data Factory which is environment-related. If you look at the JSON file created in Azure repo for the Key Vault linked service, you will find the property value is held inside the JSON file and needs to be replaced depending on the environment (as this is not parameterized as it happens in ARM template). In order to address this needs, the process in the PowerShell module is to be able to read flat configuration file with all the required values per environment. You can find how to create such configuration file here and the configuration file name should have the same as the stage name for an environment.

Therefore, we will be creating a folder named “deployment” in the repo and two files, one for each stage i.e. config-Non-Production.csv and config-Production.csv. The file and folder structure will look something like below:

Configuring Service Connection in Azure DevOps

Follow the steps below to configure service connection in Azure DevOps. To create service connection, go to Project Settings -> Service connections -> New service connection

Select “Azure Resource Manger” as service or connection type and click Next.
Select “Service principal (manual) and click Next
Enter the following values in the below sections:
Environment – Azure Cloud
Scope level – Subscription
Enter Subscription Id, Subscription Name, Service Principal Id, Service Principal Key, Tenant Id values from the part 1 of the series which was generated as part of creating Azure Service Principal using Azure CLI.
Please note we need to create two service connections, one for Dev and the other for Prod.

Testing the release pipeline

Navigate to DF-Ash-Dev and click on the Author tab and you should see no objects present in Data Factory mode.

In order to test the release pipeline, create a test release by navigating to the release pipeline “AzureDataFactory-CD” and click Create Release. The release should successfully release the changes to Non-Production stage and then wait for pending approval on Production stage.

Now is a good time to review the changes deployed to DF-Ash-Dev under “Data Factory Mode”. Now you should be able to see the objects deployed in DF-Ash-Dev.

If you look at the logs of the Azure PowerShell task “Deploy ADF artefacts” in the release pipeline, you should be able to see verbose logging showing automated stopping and starting of triggers along with the list of objects deployed. Pretty amazing!

As I am happy with the changes deployed to Dev, I will now be approving the changes to deploy to the Production stage.

The release is now deployed successfully to the Production stage.

You should now be able to see the same changes progressed to Prod Azure Data Factory with the Azure Key Vault linked service pointing to the Prod version of Azure Key Vault i.e. KV-Ash-Prod.

This concludes the final post in this blog series. Hopefully, this should help all my fellow friends who want to setup CI/CD pipeline in Azure Data Factory v2 using PowerShell.

Thanks for reading!