Streaming Video at Scale: A Tale of Transcoding at ACG
How do you manage streaming video at scale? This is how A Cloud Guru has handled video transcoding, from the early years to today — wins, warts, and all.
Oct 31, 2023 • 11 Minute Read
Video transcoding plays a huge part in our students' experience when using the A Cloud Guru platform. Our learners, who come from all over the world, can watch our content on their desktop, tablet, or on our mobile app.
And there's quite a bit of cloud-learning content to watch. Right now, we have 340 courses available, containing 11,050 individual lessons. Plus there are 26 web series containing 503 individual series videos. We’ve recently added Hands-on Labs content, so that’s an additional 1,500 or so more videos on top. And we’re regularly updating our catalog. Each of these videos can range from 2 to 20 minutes in length. That’s a lot of video content!
In this article, I’m going to share the tale of how we tackle transcoding, looking at how it was done in the very beginning through to current implementation. We’ll look at the problems and tradeoffs we encountered along the way and how we can improve on our current solution.
But what even is transcoding?
I hear you ask, "Why is it important?" At a very high level, video transcoding is the process of changing a video from its original format to another, usually downscaling it, to a more reasonable format or resolution for the end device. The process itself involves decoding a video file from one format to an uncompressed format and then encoding the uncompressed data to the desired format.
So why do we need different video formats and resolutions? Well, we want to make viewing our content compatible across multiple devices — PC, tablet, mobile, smart fridge, whatever. More importantly, we want to make sure our users have the best viewing experience possible on whatever device that they’re using.
You can imagine an ACG learner trying to view an HD video of CloudFormation Essentials on a mobile phone in an area with low connection would have a pretty poor experience!
So, the more options we have available, the more devices and users we can reach — and the better the experience for everyone.
How did we do it in the beginning?
Of course, A Cloud Guru didn’t start off with such high numbers of video content, and we also didn’t have a mobile application at that point.
In the very beginning, way back in 2015, video transcoding was a manual process. The team would upload a video to a program called Handbrake and the application would transcode that video.
Needless to say, before long a new approach was needed!
Introducing AWS Elastic Transcoder
We’re a Serverless shop run on Amazon Web Services (AWS), so it made perfect sense to utilize Elastic Transcoder — AWS's media transcoder in the cloud. It was AWS’s first transcoding service, and it's used in many applications today.
As with Handbrake, it transcodes media files from one format to various different formats that can play on different devices.
Elastic Transcoder also provides transcoding presets for popular output formats, so you don’t need to guess which settings work best on specific devices. So whether it’s an iPhone, Samsung, Kindle, or whatever, it caters for those devices.
In the same year (2015), a revamp of the transcoding process took place. The initial process was totally manual, and it was about time to automate it!
Typically, a content creator on our platform will need to convert an mp4 video of at least 1080p (full high definition resolution). When a user watches one of our videos on desktop, the quality by default is set to 720p, which is standard HD. We also provide 1080p and a lower 480p as a playback option as well.
How did our first implementation work?
- Our content team would go to upload an MP4 file from an editor UI, adding it to our S3 input bucket. The file name had data encoded into it, so it might look like: course/aws-csa/section-1/component-1.mp4
The file name is broken up by course, course name, section, and component, as you’d have multiple videos that make up different sections of a course.
- The upload to the input bucket would then trigger a lambda function, triggered by an ObjectCreated event in the input S3 bucket.
- The lambda would write out to Firebase with the processing status that a new transcoding job has been kicked off for this file.
- The lambda would then pass the S3 object key as the input, to Elastic Transcoder to process the mp4.
- Elastic Transcoder would place the transcoded videos, in the formats we specified, into the output bucket. It’s important to note that at this point, the configurations we wanted to transcode to were fixed in our source code.
- We then had a lambda that would be triggered off of an ObjectCreated event in the output bucket, which would go and update Firebase with the status of the transcoding job to be “complete.”
- Throughout this process, data was being resynced to the browser (Firebase has websockets). When the transcoder was working in the background, the status was being updated in Firebase, and Firebase was then pushing the status to the browser as it was happening.
- Finally, for any learner watching on the platform, they would be accessing the content via a signed URL. And they’d be watching an mp4 of 1080p, 780p, or 480p depending.
Problems with this approach
At this point in time, we were introducing web series, and this architecture is tightly coupled to Courses and Firebase. We knew we wanted to have different types of content aside from course videos, and this solution isn’t generic.
Another issue was that data was encoded into the filename. So this implementation would pass data around the service based on the name of the file. This is how it would know the file is related to a specific video in a specific course. This is how the Firebase hierarchy was determined, tightly coupling it to Courses specifically.
S3 supports metadata, so we could have added this data via key value pairs as metadata rather than extracting it from the file name.
Video presets were hardcoded into the source code, so we couldn’t add arbitrary data throughout the process. If we wanted to transcode a video with different configurations, we couldn’t do that as the presets were fixed in the source code.
We also didn’t have DynamoDB to look up or store any additional data — literally all you had was the file name being passed in the lambda function.
How did we move away from this?
We knew we wanted to have different types of content and the previous design tightly coupled Courses to Firebase, so we decided to decouple the types of content. Now we have different services for web series and courses and a content service that does the transcoding.
The content service itself is generic. It doesn’t know if the video content is Series or Course material, or about the series or course services — it just focuses on the transcoding job.
The browser will call the content service to create a piece of content by uploading a video to transcode and the content service will return a contentID. This contentID will be associated with an episode in the web series service or a section in the course service, depending on the content.
A peek into the content service
The content service in the diagrams is a service with GraphQL endpoints that trigger lambdas that then talk to specific AWS services depending on the request. I’m representing it with the GraphQL logo here.
From the content editor UI, we’d start off by creating a piece of content via a GraphQL mutation/API that would trigger a handler to write this piece of content to DynamoDB.
With this API call, we're able to specify the video definitions/formats we want. For example, we can pass 720p, 1080p, mp3, etc. Whereas in the previous implementation, this detail was hardcoded in the source code.
Next, the content service creates a pre-signed URL, so that the content can only be uploaded by content editors, into an S3 input bucket.
Now, we do the transcoding!
- A request is sent to Elastic Transcoder to start a job, to transcode the content.
- Elastic Transcoder gets the video it's been requested to transcode from the S3 input bucket and puts the transcoded video files in an S3 output bucket once the job is complete.
As the transcoding process is happening, Elastic Transcoder is updating the DynamoDB table with the job status, as it gives you a start and finish event for the job. For example, once Elastic Transcoder has finished the job, it sends that status in the payload of an SNS topic. And then the lambda subscribed to the topic would go and update DynamoDB. - As this is happening, the browser is polling for updates to update the UI with the processing status. This replaces the web sockets that could be used with Firebase. We’ve moved away from Firebase to DynamoDB now and this is the tradeoff we’ve made.
4. Once the job is done, the transcoded files are put into the S3 Output bucket that CloudFront fronts with signed URLs for the video content. This makes sure our content is only accessible to members of the platform!
Problems with our current approach
Most of this solution is in CloudFormation, but not all! Unfortunately, Elastic Transcoder isn’t a supported resource so we did have to set that up manually in the console.
Plus, the Elastic Transcoder service is now outdated. There is a newer service we can look to use instead, which I’ll get to shortly!
So having come this far, and the changes we’ve made, what have we achieved? Well, we’ve managed to centralize all our videos through the content service. Every bit of video content, regardless of what it's for (Course or Web Series, and now Hands-on Labs) all uses the same interface to be transcoded. We’ve moved away from that tightly coupled solution before.
Looking to the future
How can we improve on our current implementation? I mentioned Elastic Transcoder has become outdated. It was the first AWS service that came out that was able to transcode video but now there’s a newer service to use in replacement of this — AWS Elemental MediaConvert.
MediaConvert can do the same things as Elastic Transcoder, and more! It supports more input and output formats, as well as HEVC and H.264 compression standards (Elastic Transcoder only supports H.264) and other new capabilities including video-quality improvements, codecs, and other add-on features.
It is available in CloudFormation, so we’d be able to have a full IaC solution, which is awesome! You can generate transcoding jobs in Cloud Formation, as well as Queues and Presets.
MediaConvert has a similar pricing model to Elastic Transcoder, so you pay as you go and pay for what you use. And the rates are based on the duration of output video. Using MediaConvert is significantly cheaper, this would definitely help us with our transcoding bill each month!
We’ve now got an effective transcoding service for our content, and hopefully, MediaConvert can be the icing on the cake in the near future!