Data Platform in Use:
The Data platform consists of Data Lake, Azure Data Factory, Azure Databricks, Synapse Analytics and Power BI. Here’s the high-level architecture:
What is Azure DevOps & Why?
1a. What is Azure DevOps?
Azure DevOps provides set of tools and practices to collaborate across teams within the organization to build, integrate and deploy applications/Services.
1b. Why Azure DevOps?
We have opted DevOps component in Azure for the following reasons:
- Easy and tight integration with Azure native components such Azure Data factory, Synapse Analytics, Azure Databricks etc.
- Maintenance-free operations
- Allows to manage security roles and groups
- Elastic scaling
Pre-requisites:
- Azure DevOps Organization and Project should be setup and configured
- Git Repository is already setup in Repos section of DevOps
Azure Data Factory CI/CD with Azure DevOps:
1. As explained in pre-requisites, A DevOps project should be setup with a Git Repository in Azure DevOps for Azure Data Factory. Typical CI/CD process We carry out for Azure Data Factory is as below:
credits: Microsoft (slightly modified based on our architecture)
2. Open Azure Data Factory Studio and check if it is connected to Git. If not, you may configure Azure DevOps Git by clicking on Manage -> Source Control -> Git Configuration
3. After Git is configured, you could notice that you’re connected to the master branch instead of Live mode on Top Left. You can create a feature branch out of master after this to add your changes to ADF.
4. If Any changes made to the ADF components such as Linked Services, Pipelines, DataFlow sections, click on “Save All” to commit the changes to the master branch or any branch you’re currently on.
5. After Saving the changes to the feature branch, You can merge the branch to master branch. If you’re already on master branch, this step is not required.
6. Click on Publish to commit the changes to the Live Mode in Dev Environment. This Publish operation also generates ARM templates and will be saved to adf_publish branch in Git. This covers the CI in CI/CD for ADF.
1. Switch to Azure DevOps to carry out the deployment process to Production. Select Releases from the Pipelines section and select New Release Pipeline
2. Select artifacts built from the adf_publish branch in Repos section
3. Add a task that connects to Production ADF.
4. Add Stop/Start Trigger PowerShell scripts if any active triggers are in place. Stop Trigger task will stop all the active triggers and Start Trigger task will be invoked after the deployment to Production to active the triggers and associated schedules.
Stop Trigger Code:
# You can write your azure PowerShell scripts inline here.
# You can also pass predefined and custom variables to this script using arguments
$triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName -ResourceGroupName
Start Trigger Code:
# You can write your azure PowerShell scripts inline here.
# You can also pass predefined and custom variables to this script using arguments
$triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName -ResourceGroupName
$triggersADF | ForEach-Object { Start-AzDataFactoryV2Trigger -ResourceGroupName -DataFactoryName -Name $_.name -Force }
5. ARM Template deployment task is added to deploy the artifacts built in adf_publish branch. Fill in all the required subscription and target resource group details in the given drop-down lists.
5a. While Selecting Azure Resource Manager Connection, It prompts to Authorize and creates a service connection to Azure Data Factory.
· Sign in DevOps as the owner of the Azure Pipelines Organization and the Azure Subscription
· You don't need to further limit the permissions for Azure resources accessed through the service connection.
· When you click on Authorize button, It will Automatically create a service connection.
6. Override template parameters section lists all the parameters to be overridden with Production specific values such as linked services, connection strings, Databricks URLs etc.
7. If you don’t find any specific parameter that needs to be overridden, Navigate to Azure Data Factory (source) -> Manager -> ARM template -> Edit parameter configuration file to add custom parameters.
Refer documentation for more: https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-delivery-resource-manager-custom-parameters
8. Enable continuous deployment trigger to continuously poll for the changes done to adf_publish branch and invoke the deployment
9. To have better control on releases, enable pre-deployment conditions as well:
Conclusion:
This concludes the CI/CD process for Data Factory using Azure DevOps by setting up a local Azure git repository and using all Azure cloud native components.