- Lab
- A Cloud Guru
Upserting Data in Azure Synapse Analytics
Upsert allows you to insert and update data in the target table as a single transaction. Here, we’ll upsert the data using a Synapse pipeline in a dedicated SQL pool table. In the first run, we'll load all the records in the table. And in the second run, we'll insert and update records as a single transaction.
Path Info
Table of Contents
-
Challenge
Set Up the Environment
- Create an Azure Synapse Analytics instance by defining a new Azure Data Lake Gen2 account.
- Create a container named taxidata in the Data Lake account.
- Upload the
TaxiRides.csv
file in the container. The file link is available in the Additional Information and Resources section.
-
Challenge
Set Up Dedicated SQL Pool Instance
Within Synapse workspace, create a dedicated SQL pool instance named TaxiRidesWarehouse, with the performance level of DW100c.
-
Challenge
Create a Pipeline to Copy Data from Data Lake File to Dedicated SQL Table
- Create a linked service for the data lake.
- Create a TaxiRides table in the dedicated SQL pool. The script to create it is available in the Additional Information and Resources section.
- Create a pipeline with a Copy activity.
- From the source of the Copy activity, create an integration dataset for the data lake file. Use the format as Delimited Text.
- From the sink of the Copy activity, create an integration dataset for the dedicated SQL pool table, TaxiRides. Use Azure Synapse dedicated SQL pool as the data store.
- In the sink of the Copy activity, set Copy method as Upsert. Add RideId as the key column.
-
Challenge
Complete Initial File Load
- Run the pipeline.
- Use an SQL query to verify that 100 records have been successfully added to the table.
- Verify that the PassengerCount column value for RideId = 1 is 1.
-
Challenge
Upsert Data from File to Dedicated SQL Table
- In the data lake account, open and edit the file
TaxiRides.csv
. - In the file, only keep the record for RideId = 1 and set PassengerCount = 2. Remove all other records.
- In the file, add a new record for RideId = 10000. Add any data for the record but make sure to add the right number of columns.
- Save the file changes and run the pipeline.
- Verify that the table now has 101 records, with an additional record for RideId = 10000.
- Verify that the PassengerCount column value for RideId = 1 is now updated to 2.
- In the data lake account, open and edit the file
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.