What is Amazon Kinesis, and how does it work?
An explanation of the Kinesis family, how it can help you build out your environment and move your data around quickly and efficiently.
Feb 21, 2024 • 4 Minute Read
You’ve got to analyze some data. Immediately, you brace yourself for a world of pain: waiting for your entire dataset to accumulate, then a lengthy journey of processing it, facing delays that could span from minutes to weeks. Isn’t there an easier way?
Thankfully, there is: It’s called Amazon Kinesis. Kinesis is a real-time data streaming service that liberates you from inertia of data accumulation. Instead, you can process and analyze data the moment it's generated, leading to more timely insights and faster decision-making.
In this article, I’ll break down the four main methods you can use to transport and transform your data with Kinesis: Kinesis Data Streams, Kinesis Data Firehose, Managed Apache Flink (Formerly Kinesis Data Analytics), and Kinesis Video Streams.
The four methods for transporting and transforming your data with Amazon Kinesis
1. Kinesis Data Streams
Data Streams allow you to push data into the stream so that it can be processed by another service. Data can come into a Kinesis Data Stream where it becomes a record and is processed by a shard.
What is a record in Kinesis Data Streams?
A record is a fancy way of describing a unit of data stored in a data stream. A record is composed of a sequence number, a partition key and the data blob. The data blob being the data of interest that is coming through the stream. The sequence number is the unique id that allows you to identify each data blog in the sequence. The partition key is a way to help guild where your data is going. Allowing you to group your records together so that data blobs can be sorted together.
What is a shard in Kinesis Data Streams?
Shards are a way to process data using the record markers allowing your data to flow into your destination. A shard can support 1MB/second and 1,000 records per second for writes and 2MB per second for reads. These limits allow for predictable performances.
2. Kinesis Data Firehose
Kinesis Data Firehose represents a comprehensive solution for ETL (Extract, Transform, Load) streaming tasks. This service facilitates the extraction of data from a variety of sources, including dynamic data streams. These streams can efficiently channel incoming data to numerous consumers.
An integral feature of this process is the ability to process data through services like AWS Lambda. This enables the transformation or filtering of data into formats that align with your specific processes. Additionally, Kinesis Data Firehose can be instrumental in data cataloging. It seamlessly integrates with services such as AWS Glue and Amazon S3, ensuring that your data is not only collected but also managed safely and efficiently.
3. Kinesis Managed Apache Flink
This service capitalizes on data provided by both Data Stream and Data Firehose, offering two distinct methods for data processing.
The first method involves the use of an Apache Flink application. This allows you to construct an application specifically for processing and monitoring incoming data from Firehose, all manageable directly from the AWS console.
The second method involves a Studio notebook. This notebook enables the use of Apache Zeppelin, a tool that allows SQL-based queries on the data accumulated through the aforementioned Kinesis services.
Zeppelin's functionality extends to creating tables and sections within databases, integrating smoothly with AWS Glue. This dual-method approach provides a comprehensive way to not only access your data but to also gain deeper insights, ensuring you are in tune with the data's narrative and implications.
4. Kinesis Video Stream
Kinesis Video Stream empowers you to establish data streams using your video inputs. These inputs can range from security cameras and webcams to various other media producers. The service simplifies and secures the streaming of media from connected devices to AWS. This integration facilitates diverse functionalities such as storage, analytics, machine learning, and playback, catering to a broad spectrum of your media data processing needs.
Kinesis Video Stream stands out as an all-encompassing service for streaming media data, ensuring a seamless transition from the initial data influx to the final consumption by your intended audiences.
Conclusion
We have talked about the Kinesis family as a whole, and learned what each of the four members of the family can do and what they specifically are. I hope this will help you build out your environment, and can help you move your data around quickly and efficiently.
If you want to learn more about Amazon Kinesis and the services that can connect to it, check out my Pluralsight course, “Deep Dive into Amazon Kinesis.” This course digs deep into how Amazon Kinesis and various related AWS services can work together for your data.