Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Azure icon
Labs

Work with GitHub in Azure Data Factory

Azure Data Factory and Synapse Pipelines allow you to connect and configure a Git repository, where all the artifacts can be stored. In this lab, we'll create a new source control repository in GitHub. We'll then configure GitHub with Azure Data Factory by defining collaboration and publish branches. Then, we'll see what changes can be committed to these branches, and how it generates an ARM template for deployment.

Azure icon
Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 30m
Published
Clock icon Nov 09, 2023

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Set Up the Azure Environment

    1. Create an Azure Data Lake Gen2 account.
    2. Create a container called taxidata in the Data Lake account.
    3. Upload the TaxiRides.csv file to the container.

      Note: Make sure you have downloaded the TaxiRides.csv file that is located in this GitHub repository.

    4. Create an Azure Data Factory instance.
  2. Challenge

    Set Up a New Repository in GitHub

    1. Log in to an existing GitHub account, or create a new GitHub account if necessary. The link to the GitHub site is available in the Additional Information and Resources section.
    2. Create a new public repository.
  3. Challenge

    Configure GitHub in Azure Data Factory

    1. Authenticate to GitHub from Data Factory, and configure the repository.
    2. Define collaboration and publish branches.
    3. Select the collaboration branch as the working branch.
  4. Challenge

    Save Changes to Collaboration Branch

    1. Create a pipeline with Copy activity. Do not fill all the properties so that the pipeline remains invalid.
    2. Save the pipeline with invalid changes, and verify them in the collaboration branch of the GitHub repository.
    3. Complete the pipeline by copying a file from one data lake folder to another. Create a linked service to the data lake and two datasets (one for the source file and the other for the sink file).
    4. Save the pipeline with valid changes, and verify them in the collaboration branch of the GitHub repository.
  5. Challenge

    Publish the Changes to Publish Branch

    1. Publish the pipeline with valid changes.
    2. Verify that an ARM (Azure Resource Manager) template has been generated and stored in the publish branch.

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans