Lab
A Cloud Guru

Upserting Data in Azure Synapse Analytics

Upsert allows you to insert and update data in the target table as a single transaction. Here, we’ll upsert the data using a Synapse pipeline in a dedicated SQL pool table. In the first run, we'll load all the records in the table. And in the second run, we'll insert and update records as a single transaction.

Try for free Contact sales

Path Info

Level

Intermediate

Duration

45m

Published

Nov 09, 2023

Challenge

Set Up the Environment
1. Create an Azure Synapse Analytics instance by defining a new Azure Data Lake Gen2 account.
2. Create a container named taxidata in the Data Lake account.
3. Upload the TaxiRides.csv file in the container. The file link is available in the Additional Information and Resources section.
Challenge

Set Up Dedicated SQL Pool Instance

Within Synapse workspace, create a dedicated SQL pool instance named TaxiRidesWarehouse, with the performance level of DW100c.
Challenge

Create a Pipeline to Copy Data from Data Lake File to Dedicated SQL Table
1. Create a linked service for the data lake.
2. Create a TaxiRides table in the dedicated SQL pool. The script to create it is available in the Additional Information and Resources section.
3. Create a pipeline with a Copy activity.
4. From the source of the Copy activity, create an integration dataset for the data lake file. Use the format as Delimited Text.
5. From the sink of the Copy activity, create an integration dataset for the dedicated SQL pool table, TaxiRides. Use Azure Synapse dedicated SQL pool as the data store.
6. In the sink of the Copy activity, set Copy method as Upsert. Add RideId as the key column.
Challenge

Complete Initial File Load
1. Run the pipeline.
2. Use an SQL query to verify that 100 records have been successfully added to the table.
3. Verify that the PassengerCount column value for RideId = 1 is 1.
Challenge

Upsert Data from File to Dedicated SQL Table
1. In the data lake account, open and edit the file TaxiRides.csv.
2. In the file, only keep the record for RideId = 1 and set PassengerCount = 2. Remove all other records.
3. In the file, add a new record for RideId = 10000. Add any data for the record but make sure to add the right number of columns.
4. Save the file changes and run the pipeline.
5. Verify that the table now has 101 records, with an additional record for RideId = 10000.
6. Verify that the PassengerCount column value for RideId = 1 is now updated to 2.

Author

A Cloud Guru

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.