- Lab
- A Cloud Guru
Creating a Data Pipeline Using Azure Synapse Pipelines
Data pipelines can be created in Azure Data Factory, as well as Azure Synapse Pipelines. Here we’ll build a two-step pipeline using Azure Synapse Pipelines. First, we'll copy the data from Azure SQL to a data lake, then from the data lake to a dedicated SQL pool table using PolyBase.
Path Info
Table of Contents
-
Challenge
Set Up the Environment
- Create an Azure Synapse Analytics instance by defining a new Azure Data Lake Gen2 account.
- Create a container named staging in the data lake account.
-
Challenge
Set Up Dedicated SQL Pool Instance
Within Synapse workspace, create a dedicated SQL pool instance named TaxiRidesWarehouse, with the performance level of DW100c.
-
Challenge
Create a Pipeline to Copy Data from Azure SQL Table to Data Lake File
- Create a linked service for the Azure SQL database. Credentials to connect to Azure SQL are available in Additional Information and Resources section.
- Create a linked service for the data lake.
- Create an integration dataset for Azure SQL table: SalesLT.Customer.
- Create an integration dataset for the data lake file. Use the format as Parquet. Keep the file in the staging container, and set the import schema to none.
- Create a pipeline with a copy activity. Set source as Azure SQL table dataset, and sink as Data Lake file dataset.
- Update the mappings in the copy activity:
- Rename the CustomerID column to ID.
- Keep the following columns only: Title, FirstName, MiddleName, LastName, Suffix, CompanyName, SalesPerson, EmailAddress, and Phone.
- Run the pipeline and verify that the file is created successfully in data lake.
-
Challenge
Update Pipeline to Copy Data from Data Lake File to Dedicated SQL Pool Table
- Create a DimCustomer table in the dedicated SQL pool. The script to create it is available in the Additional Information and Resources section.
- Create an integration dataset for the dedicated SQL pool table, DimCustomer. Use Azure Synapse dedicated SQL pool as the data store.
- In the existing pipeline, add another copy activity. Set source as data lake file dataset, and sink as dedicated SQL pool table dataset.
- In the sink of the second copy activity, set Copy method as PolyBase.
- Connect both copy activities using the succeeded dependency. First data should be copied from Azure SQL to data lake, and then from data lake to dedicated SQL pool.
- Run the pipeline and verify that data is successfully loaded in the DimCustomer table.
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.