- Lab
- A Cloud Guru
Using Amazon S3 as a Machine Learning Repository
Imagine you are a starting Data Engineer. You have been tasked with preparing an environment for model building. In order to complete this task you need to ingest a csv file into S3 and then load that data source into a Jupyter Notebook. Finally you need to save that data back into S3 under a different table.
Path Info
Table of Contents
-
Challenge
Prepare the Environment
- Create a Jupyter Notebook in SageMaker Create a new role that allows for interaction with S3
- Create an S3 Bucket
- Download Parking-Ticket-2022 data (https://open.toronto.ca/dataset/parking-tickets/)
- Upload file: Parking_Tags_Data_2022.000.csv to the S3 Bucket
-
Challenge
Ingest Data Into SageMaker
- Create a conda_python3 Jupyter Notebook
- Ingest data from the S3 instance into the Jupyter Notebook as a data frame.
- Confirm that you can see 5 rows of the Parking_Tags_Data table
- Change the df.head command to display 11 rows instead of 5.
- Drop the 'location3' column from the data frame
- Write that table back into S3 as a csv file named 'Result.csv'
- Verify the result in S3.
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.