Conceptualizing the Processing Model for Apache Flink
Flink is a stateful, tolerant, and large-scale system with excellent latency and throughput characteristics. It works with bounded and unbounded datasets using the same underlying stream-first architecture, focusing on streaming or unbounded data.
What you'll learn
Apache Flink is built on the concept of stream-first architecture, where the stream is the source of truth. Flink offers extensive APIs to process both batch as well as streaming data in an easy and intuitive manner.
In this course, Conceptualizing the Processing Model for Apache Flink, you’ll be introduced to Flink Architecture and processing APIs to get started on your data analysis journey.
First, you’ll explore the differences between processing batch and streaming data, and understand how stream-first architecture works. You’ll study the stream-first processing model that Flink uses to process data at scale, and Flink’s architecture which uses JobManager, TaskManagers, and task slots to execute the operators and streams in a Flink application in a data-parallel manner.
Next, you’ll understand the difference between stateless and stateful stream transformations and apply these concepts in a hands-on manner in your Flink stream processing. You’ll process data in a stateless manner using the map(), flatMap(), and filter() transformations, and use keyed streams and rich functions to work with Flink state.
Finally, you’ll round off your understanding of the state persistence and fault-tolerance mechanism that Flink uses by exploring the checkpointing architecture in Flink. You’ll enable checkpoints and savepoints in your streaming application, see how state can be restored from a snapshot in the case of failures, and configure your Flink application to support different restart strategies.
When you’re finished with this course, you’ll have the skills and knowledge to design Flink pipelines performing stateless and stateful transformations, and you’ll be able to build fault-tolerant applications using checkpoints and savepoints.
Table of contents
- Version Check 0m
- Prerequisites and Course Outline 2m
- Batch Processing and Stream Processing 6m
- Batch Processing vs. Stream Processing 3m
- Stream-first Architecture 6m
- Stream Processing in Flink 3m
- Operator Subtasks and Stream Partitions 5m
- Demo: Download and Install Flink 4m
- Stateless and Stateful Transformations 2m
- Flink Architecture: Job Manager and Task Managers 7m
- Operator and Task Lifecycle 3m
- Demo: Starting a Flink Cluster and Submitting Streaming Applications 7m
- Demo: Debugging Errors Using the Flink Dashboard 1m
- Demo: Exploring Default Configuration Settings 3m
- Demo: Cluster and Job Specific Configuration Settings 4m
- Demo: Setting up a Maven Flink Project 3m
- Demo: Implementing Your First Streaming Application 6m
- Demo: Configuring Job Specific Properties 5m
- Demo: Explicitly Specifying UIDs 2m
- Flink Clusters and Deployment 6m
- High Availability with Flink 3m
- Demo: Reading Streaming Data from a Text File 3m
- Demo: Packaging and Submitting a Streaming Job to the Flink Cluster 5m
- Stateless Transformations 2m
- Demo: Performing Filter Operations on Input Streams 4m
- Demo: Performing Map Operations on Input Streams 6m
- Demo: Performing Flatmap Operations on Input Streams 6m
- Demo: More Flatmap Operations on Input Streams 4m
- Flink APIs 4m
- Demo: Introducing the Dataset API 3m
- Demo: Map and Filter Using Datasets 2m
- Demo: Introducing the Table API 5m
- Demo: Running SQL Queries on Streaming Data 2m
- Demo: Reading Continuously from a File Source 4m
- Demo: Writing out to a Streaming File Sink 3m
- Demo: Streaming Sink with Rollover Policy 2m
- Demo: Processing a File Exactly Once 2m
- Fault Tolerance Guarantees 4m
- Stateful Transformations 2m
- Keyed Streams 4m
- Demo: Keyed Streams 5m
- States in Flink 4m
- Keyed State Interfaces and Rich Functions 5m
- Demo: Value State - Max Closing Price 7m
- Demo: Value State - Rolling Average 4m
- Demo: Value State - Rolling Average per Key 2m
- Demo: List State - Days since Price Threshold Breach 4m
- Demo: Reducing State - Rolling Average 5m
- State Backends 5m
- Checkpoints 2m
- Stream Barriers and Aligned Checkpoints 6m
- Unaligned Checkpoints 2m
- Demo: Enabling and Configuring Checkpoints 5m
- Demo: Default in Memory Checkpoints 4m
- Demo: Persistent Checkpoints Using the Fs State Backend 4m
- Demo: Configuring the State Backend for the Cluster 3m
- Savepoints 4m
- Demo: Manually Triggering Savepoints 4m
- Demo: Restoring Applications from Savepoints 3m
- Restart Strategies 4m
- Demo: Restart Strategy - Fixed Delay 5m
- Demo: Restart Strategy - No Restart 1m
- Demo: Restart Strategy - Failure Rate 3m
- Summary and Further Study 1m