S3 Direct upload with Cognito authentication

We recently needed to demonstrate AWS RDS for a customer’s existing Oracle database running in their colo datacenter. Their Oracle DB dump was about 200 GB in size, and had to be moved to an AWS account securely.

Let’s first discuss the existing options and why it wasn’t right for our situation and then we will explain how we solved it using S3 direct upload using Cognito authentication.

Since we were dealing with large files, we wanted our customer to upload the files directly to Amazon S3. But unfortunately our customer is relatively new to AWS and training them to upload using AWS CLI or the Management Console would delay the project so we started looking for alternate options.

Problem statement: A customer needed to transfer an Oracle database dump of 200 GB securely to an AWS account.

We considered Cyberduck as our second option. Cyberduck is an open source client for FTP and SFTP, WebDAV, and cloud storage, available for macOS and Windows. It supports uploading to S3 directly using AWS credentials. We could create a new IAM user with limited permission and share the credentials with the customer, along with credentials we need share S3 bucket and folder names. But again in this solution also, the customer needs to install a external software installation and then follow certain steps to upload the files. It meant they had to take approvals to install software, and that was adding to the delay. This may be slightly easy compared to the first option but still introduced a lot of friction.

While investigating further for a friction free solution, we discovered that we can directly upload files into S3 from the browser using multi-part upload. Initially we were doubtful if this will work for large files as browsers usually have limitations on size of the file that can be uploaded. We thought unless we try it, we will never know so we decided to give it a shot.

We can directly upload files from the browser to S3 but how to make it secure?

Browsers expose the source code so obviously we can’t put credentials in the source and we thought we should use S3 Signed URLs and very soon we realized that we need to predefine the object key/filename to be stored while generating the pre-signed URL, which is again not a very desirable option for us. In order to make this process dynamic in our Serverless website, we need to write a AWS Lambda function which can generate the pre-signed URL based on file name the user provides, and call it using API gateway. While this is a possible solution, we found a better solution using Amazon Cognito.

Cognito has user pools and identity pools. User pools are for maintaining users and identity pools are for generating temporary AWS credentials using several web identities including Cognito user identity.
We created a user pool in Cognito and associated it to a identity pool. Identity pool provides credentials to both authenticated and unauthenticated users based on associated IAM roles and policies. Now any valid user in our Cognito user pool can get temporary AWS credentials using the associated identity pool and use these temporary credentials to directly uploaded files to S3.

Cognito architecture for secure S3 uploads

We have successfully implemented the upload solution using above architecture and testing by uploading 200 GB files and it works seamlessly. Our customer was successfully able to upload their DB files within no time.

Login Page

Landing Page After Login

Completed and In-Progress Uploads

References for the code using AWS JavaScript SDK: