How to add file upload features to your website with AWS Lambda and S3
AWS Lambda upload file to S3 to allow a large number of users to upload files. The scaling is seamless and automatic. Learn More!
Jun 08, 2023 • 6 Minute Read
File uploading presents a scalability problem that’s easy to fix with serverless — without taxing your pocketbook.
The mechanism for uploading files from a browser has been around since the early days of the Internet. In the server-full environment it’s very easy to use Django, Express, or any other popular framework. It’s not an exciting topic — until you experience the scaling problem.
Imagine this scenario — you have an application that uploads files. All is well until the site suddenly gains popularity. Instead of handling a gigabyte of uploads a month, usage grows to 100Gb an hour for the month leading up to tax day. Afterwards, the usage drops back down again for another year. This is exactly the problem we had to solve.
File uploading at scale gobbles up your resources — network bandwidth, CPU, storage. All this data is ingested through your web server(s), which you then have to scale — if you’re lucky this means auto-scaling in AWS, but if you’re not in the cloud you’ll also have to contend with the physical network bottleneck issues.
You can also face some difficult race conditions if your server fails in the middle of handling the uploaded file. Did the file make to its end destination? What was the state of the processing? It can be very hard to replay the steps to failure or know the state of transactions when the server is overloaded.
Fortunately, this particular problem turns out to be a great use case for serverless — as you can eliminate the scaling issues entirely. For mobile and web apps with unpredictable demand, you can simply allow the application to upload the file directly to S3. This has the added benefit of enabling an https endpoint for the upload, which is critical for keeping the file’s contents secure in transit.
All this sounds great — but how does this work in practice when the server is no longer there to do the authentication and intermediary legwork?
The S3 uploader demo app
I set up an app in the AWS Serverless Application Repository that you can deploy to your own AWS account. Try installing the app and deploying the Gist homepage in the documentation, and then we’ll walk through the solution:
What’s happening behind the scenes is a two-step process — first, the web page calls a Lambda function to request the upload URL, and then it uploads the JPG file directly to S3:
The URL is the critical piece of the process — it contains a key, signature and token in the query parameters authorizing the transfer. Without these, the transfer will fail.
Feel free to clone the Github repo. The public-facing demo app deletes all files within 24 hours and has throttling enabled to prevent abuse, just so you know.
Why use a Lambda function?
It’s possible to eliminate the Lambda function and do everything from the client browser — but there are a number of good reasons to avoid this approach. Apart from adding significantly more code to the web page, the Lambda function allows you to control the process away from the prying eyes of any potential attacker.
For example, what if you want to authorize the user first — maybe only paying subscribers can upload, whereas free trials are read-only? Or maybe you need to add extra hooks in the process to trigger other workflows, logging, or add a breaker in the event there are too many uploads. Or you might not be comfortable revealing bucket names or other information in the client-side code.
The Lambda function requesting the signed URL — the step 1 behind this demo app — is fairly minimal:
const uuidv4 = require('uuid/v4')
const AWS = require('aws-sdk')
AWS.config.update({ region: process.env.REGION || 'us-east-1' })
const s3 = new AWS.S3();
exports.handler = async (event) => {
return await getUploadURL()
}
const getUploadURL = async () => {
const actionId = uuidv4()
const s3Params = {
Bucket: '<< ENTER YOUR BUCKET NAME HERE >>',
Key: `${actionId}.jpg`,
ContentType: 'image/jpeg',
ACL: 'public-read',
}
return new Promise((resolve, reject) => {
let uploadURL = s3.getSignedUrl('putObject', s3Params)
resolve({
"statusCode": 200,
"isBase64Encoded": false,
"headers": { "Access-Control-Allow-Origin": "*" },
"body": JSON.stringify({
"uploadURL": uploadURL,
"photoFilename": `${actionId}.jpg`
})
})
})
})
Although this function is bare bones, it would be easy to add all sorts of logic at this stage before making the request for the Signed URL. Once the function is in place, it’s then just a matter of setting up API Gateway and creating a single GET method to create an endpoint. Alternatively, you can deploy with the Serverless Framework and automate this step.
Building the front end
The project is based on a boilerplate Vue template, but all the important work to demonstrate this functionality happens in s3uploader.vue.
Once again, this is minimal coding with no error handling or niceties to keep the example simple, but you can see the code required to make this work is only a handful of lines. If you open the Chrome developer console (press F12), you can see the console.log output throughout the process.
Set your permissions
Finally, a word about permissions — when you are requesting the signed URLs, the requesting function needs the appropriate IAM permissions for both the request and uploading the file.
Some managed policies include the overly-generous S3:* permission. It’s common to see these functions work in development and fail in production because an IAM role gets switched to a much narrower set of privileges when promoted to production. The IAM role used by the function must be able to write into the bucket, otherwise it won’t work.
Back to scalability
The reason why all this is worthwhile comes back to scalability. In the event that you need to allow large numbers of users to upload files, this approach takes all the network burden away from you (the users upload directly to S3) and the scaling is seamless and automatic. When nobody uses your function, you pay nothing.
Additionally, since there’s no server hop between the user and S3, you can eliminate another point of failure. S3 is famous for its ‘11 9s’ of durability, so you also benefit from the fact that it’s nearly impossible for the file to just disappear.
Overall, given the benefits of the serverless implementation, it seems to be the obvious and easy way to manage any form of file uploading when working with AWS infrastructure.
Get the skills you need for a better career.
Master modern tech skills, get certified, and level up your career. Whether you’re starting out or a seasoned pro, you can learn by doing and advance your career in cloud with ACG.