Big Data on Amazon Web Services
This course contains an overview and demonstration of numerous components in the Amazon Web Services (AWS) Big Data Stack.
What you'll learn
This course provides a tour through Amazon Web Services' (AWS) Big Data stack components, namely DynamoDB, Elastic MapReduce (EMR), Redshift, Data Pipeline, and Jaspersoft BI on AWS. AWS Kinesis is also discussed. All steps for creating an AWS account, setting up a security key pair and working with AWS Simple Storage Service (S3) are covered as well. Numerous demos are provided, demonstrating interaction through AWS components via Web browser user interfaces, command line, and desktop tools.
Table of contents
- EMR Job Flows, What Is Hadoop? 5m
- The Hadoop Stack, EMR Components 3m
- A MapReduce Example 5m
- More on EMR Job Flows 2m
- Demo - Creating and Connecting to an EMR Cluster 7m
- Using MapReduce, HDFS Commands 4m
- Demo - Running a Streaming MapReduce Job 7m
- Hive and Pig 5m
- Demo - Hive 9m
- Demo - Pig 6m
- Demo - Impala 4m
- Shutting Down the Cluster 1m
- Demo - Shutting Down the Cluster 1m
- Data Pipeline Concepts 3m
- Demo - Authoring a Pipeline from a Template 4m
- Demo - Refining, Saving, and Activating the Pipeline 5m
- Executing Pipelines 1m
- Demo - Executing Our Pipeline and Browsing Its Output 3m
- Demo - Authoring a Pipeline from Scratch 5m
- Pipeline Troubleshooting 1m
- Demo - Troubleshooting the Pipeline 6m