Getting Started with Stream Processing with Spark Streaming
The Spark Streaming module lets you to work with large scale streaming data using familiar batch processing abstractions. This course starts with how standard transformations and operations are performed on streams, and moves to more advanced topics.
What you'll learn
Traditional distributed systems like Hadoop work on data stored in a file system. Jobs can run for hours, sometimes days. This is a major limitation in processing real-time data such as trends and breaking news. The Spark Streaming module extends the Spark batch infrastructure to deal with data for real-time analysis. In this course, Getting Started with Stream Processing with Spark Streaming, you'll learn the nuances of dealing with streaming data using the same basic Spark transformations and actions that work with batch processing. Next, you'll explore you how you can extend machine learning algorithms to work with streams. Finally, you'll learn the subtle details of how the streaming K-means clustering algorithm helps find patterns in data. By the end of this course, you'll feel confident in your knowledge, and you can start integrating what you've learned into your own projects.
Table of contents
- Version Check 0m
- Limitations of Traditional Distributed Computing 5m
- Spark for Real-time Processing 5m
- Introduction to Streaming 3m
- The RDD Programming Abstraction 7m
- Using the Pyspark Interactive Shell 7m
- Discretized Streams 5m
- Working with Streaming Data in Spark Using Python 5m
- Running Your First Streaming Application in Spark 5m
- Stateless and Stateful Transformations 3m
- The updateStateByKey() Function 5m
- The updateStateByKey() Implementation 5m
- Sliding Window Operations 6m
- The countByWindow() Transformation 3m
- Summary and Inverse Functions 5m
- The reduceByWindow() Transformation 3m
- The reduceByKeyAndWindow() Transformation 4m
- Clustering Data to Find Patterns 6m
- The K-means Clustering Algorithm 7m
- The Streaming K-means Clustering Algorithm 4m
- Forgetfulness Using the Decay Factor 5m
- Forgetfulness Using Half-life 4m
- Implementing the Streaming K-means Clustering Algorithm 8m
- Running the K-means Algorithm on Twitter Location Data 4m
- The K-means Algorithm with a Decay Factor of Zero 4m