Deploying Hadoop with Cloudera CDH to AWS
Learn how to deploy, size, and scale Hadoop in the cloud (namely AWS). You'll understand key concepts to deploy a CDH cluster, perform a manual installation, and finally learn how to automate deployments for multiple clusters with Cloudera Director.
What you'll learn
Many years ago, hardware cost was pretty steep. It was not unexpected that a project with large amounts of data required 7 figures worth of hardware just to get started. But times have changed, and with cloud services it is possible now to store data cheaply and spin up as many servers with your desired specs to process this data with all kinds of available machines and get the answers that you need. In this course, Deploying Hadoop with Cloudera CDH to AWS, you will learn how to deploy Hadoop in the cloud. First you'll learn about some key topics. Then, you'll learn how to perform deployment manually. Finally, you'll learn about a specialized tool called Cloudera Director that helps automate deployments either for transient or for long running clusters. You will also learn about some differences between AWS and Azure/GCE. These differences can be important if you are working on a different platform, but by no means are they blockers for someone already familiar with their current platform. By the end of this course, you will be able to better manage your cloud needs.
Table of contents
- Understanding the Cloud: An AWS Mini Crash Course 1m
- Getting Started with the Building Blocks of Amazon Web Services 1m
- AWS Accounts and Service Limits 4m
- Understanding Pricing, the AWS Simple Calculator, and a Reminder 6m
- Security Basics: Responsibility Model, Key Pairs, and Access Keys 9m
- Working with AWS: Console, CLI, and SDK 5m
- Your Machines: Elastic Compute Cloud (EC2) 8m
- On-demand, Reserved, Scheduled, Dedicated, and Spot Instances 5m
- Takeaway 1m
- More "AWS Mini Crash Course" 0m
- Regions, Availability Zones (AZ), and Placement Groups 3m
- Networking: Virtual Private Cloud (VPC) and Subnets 8m
- Security Groups (Think of Firewall), Direct Connect & Socks Proxy 4m
- Elastic IPs 3m
- Storage: Instance, EBS, S3, Glacier, ... 8m
- Managed Databases: Relational Database Service (RDS) 4m
- Creating an AMI, Snapshots, and Bootstrapping 4m
- AWS CloudFormation: Infrastructure as Code 3m
- Takeaway 1m
- Planning your Hadoop Cluster on AWS 1m
- Security First: Cloudera on AWS 3m
- Capacity Planning 3m
- Architectural Best Practices: Cloudera on AWS 2m
- Transient Clusters vs. Persistent Clusters 3m
- Storage in the Cloud: HDFS vs. S3 2m
- Data Engineering 3m
- Inside Your Cluster: Nodes, Roles, and Services 3m
- Analytic Database 1m
- Operational Database 0m
- Preparing for Cluster Deployment 1m
- Public vs. Private Subnets: Pick a Network Topology 3m
- Configuring AWS Best Practices for Deploying Using CloudFormation 4m
- Takeaway 1m
- Deploying, Sizing, and Scaling Your CDH Cluster on AWS 1m
- Deploying CDH: All Installation Paths Lead to a Hadoop Cluster 9m
- Configuring RDS: Because Cloudera Manager Needs a Database 4m
- Deploying CDH in AWS with Cloudera Manager: The "Different" Steps 5m
- Cloudera Manager: How You Manage Your Hadoop Clusters 2m
- Right Sizing a Cluster: Adding and Removing Nodes 5m
- Right Sizing a Node: Changing EC2 Instance Type 2m
- Takeaway 1m
- Automating Deployments & Managing Clusters with Cloudera Director 2m
- An Overview of Cloudera Director 2m
- What's Needed to Run Cloudera Director? 2m
- Supported Clouds: AWS, Azure, and GCE 1m
- Deploying Cloudera Director 5m
- Cloudera Director Interfaces: UI, CLI & API 1m
- Deploying a Cluster with Cloudera Director UI 2m
- Add Environment 2m
- Add Cloudera Manager 4m
- Add Cluster 4m
- Cloudera Director Dashboard 3m
- Cloning a Cluster 2m
- 1 Director: Many Clouds, Environments, CMs, Regions & Clusters 1m
- Terminate Cluster 1m
- Modify Cluster: Adding Deleting Instances 4m
- Repair Instance 2m
- Auto Repair 2m
- Terminate Environment, Cloudera Manager & Cluster 1m
- Automating Cluster Operations 5m
- Director Client: cloudera-director bootstrap & terminate 4m
- Director Server: cloudera-director boostrap-remote 2m
- Takeaway 1m