Will Webberley

This article was contributed by Will Webberley

Will is a computer scientist and is enthused by nearly all aspects of the technology domain. He is specifically interested in mobile and social computing and is currently a researcher in this area at Cardiff University.

Direct to S3 File Uploads in Python

Last Updated: 19 March 2014

s3 upload

Table of Contents

Web applications often require the ability to allow users to upload files such as images, movies and archives. Amazon S3 is a popular and reliable storage option for these files.

This article demonstrates how to create a Python application that uploads files directly to S3 instead of via a web application, utilising S3’s Cross-Origin Resource Sharing (CORS) support. The article and companion repository consider Python 2.7, but should be mostly also compatible with Python 3.3 and above except where noted below.

If you have questions about Python on Heroku, consider discussing it in the Python on Heroku forums. Both Heroku and community-based Python experts are available.

Uploading directly to S3

A complete example of the code discussed in this article is available for direct use in this Github repository.

The main advantage of direct uploading is that the load on your application’s dynos would be considerably reduced. Using server-side processes for receiving files and transferring to S3 can needlessly tie up your dynos and will mean that they will not be able to respond to simultaneous web requests as efficiently.

If your application relies on some form of file processing between the client’s computer and S3 (such as parsing Exif information or applying watermarks to images), then you may need to employ the use of extra dynos and consider the boto Python library for handling the S3 upload.

The application uses client-side JavaScript and Python for signing the requests. It will therefore be a suitable guide for developing applications for the Flask, Bottle and Django web frameworks. The upload is carried out asynchronously so that you can decide how to handle your application’s flow after the upload has completed (for example, a page redirect upon successful upload rather than a full page refresh).

An example simple account-editing scenario is used as a guide for completing the various steps required to accomplish the direct upload and to relate the application of this to a wider range of use-cases. More information on this scenario is provided later.

Overview

S3 is comprised of a set of buckets, each with a globally unique name, in which individual files (known as objects) and directories, can be stored.

For uploading files to S3, you will need an Access Key ID and a Secret Access Key, which act as a username and password. The access key account will need to have sufficient access privileges to the target bucket in order for the upload to be successful.

Please see the S3 Article for more information on this, creating buckets and finding your Access Key ID and Secret Access Key.

The method described in this article involves the use of client-side JavaScript and server-side Python. In general, the completed image-upload process follows these steps:

  • A file is selected for upload by the user in their web browser;
  • JavaScript is then responsible for making a request to your web application on Heroku, which produces a temporary signature with which to sign the upload request;
  • The temporary signed request is returned to the browser in JSON format;
  • JavaScript then uploads the file directly to Amazon S3 using the signed request supplied by your Python application.

This guide includes information on how to implement the client-side and server-side code to form the complete system. After following the guide, you should have a working barebones system, allowing your users to upload files to S3. However, it is usually worth adding extra functionality to help improve the security of the system and to tailor it for your own particular uses. Pointers for this are mentioned in the appropriate parts of the guide.

Prerequisites

  • The Heroku Toolbelt has been installed;
  • A Heroku application has been created for the current project;
  • An AWS S3 bucket has been created.

Initial setup

Heroku setup

In order for your application to access the AWS credentials for signing upload requests, they will need to be added as configuration variables in Heroku:

If you are testing locally before deployment, remember to add the credentials to your local machine’s environment, too.

$ heroku config:set AWS_ACCESS_KEY_ID=xxx AWS_SECRET_ACCESS_KEY=yyy
Adding config vars and restarting app... done, v21
    AWS_ACCESS_KEY_ID     => xxx
    AWS_SECRET_ACCESS_KEY => yyy

In addition to the AWS access credentials, set your target S3 bucket’s name:

$ heroku config:set S3_BUCKET = zzz
Adding config vars and restarting app... done, v21
    S3_BUCKET     => zzz

Using config vars is preferable over configuration files for security reasons. Try to avoid placing passwords and access keys directly in your application’s code or in configuration files.

S3 setup

You will now need to edit some of the permissions properties of the target S3 bucket so that the final request has sufficient privileges to write to the bucket. In a web-browser, sign in to the AWS console and select the S3 section. Select the appropriate bucket and click the ‘Properties’ tab. Select the Permissions section and three options are provided (Add more permissions, Edit bucket policy and Edit CORS configuration).

CORS (Cross-Origin Resource Sharing) will allow your application to access content in the S3 bucket. Each rule should specify a set of domains from which access to the bucket is granted and also the methods and headers permitted from those domains.

Locating the ‘Properties’ tab and CORS configuration editor

For this to work in your application, click ‘Add CORS Configuration’ and enter the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
   <CORSRule>
        <AllowedOrigin>yourdomain.com</AllowedOrigin>
        <AllowedMethod>GET</AllowedMethod>
        <AllowedMethod>POST</AllowedMethod>
        <AllowedMethod>PUT</AllowedMethod>
        <AllowedHeader>*</AllowedHeader>
    </CORSRule>
</CORSConfiguration>

Click ‘Save’ in the CORS window and then ‘Save’ again in the bucket’s ‘Properties’ tab.

This tells S3 to allow any domain access to the bucket and that requests can contain any headers. For security, you can change the ‘AllowedOrigin’ to only accept requests from your domain.

If you wish to use S3 credentials specifically for this application, then more keys can be generated in the AWS account pages. This provides further security, since you can designate a very specific set of requests that this set of keys are able to perform. If this is preferable to you, then you will need to also set up an IAM user in the Edit bucket policy option in your S3 bucket. There are various guides on AWS’s web pages detailing how this can be accomplished.

Direct uploading

The processes and steps required to accomplish a direct upload to S3 will be demonstrated through the use of a simple profile-editing scenario for the purposes of this article. This example will involve the user being permitted to select an avatar image to upload and enter some basic information to be stored as part of their account.

In this scenario, the following procedure will take place:

  • The user is presented with a web page, containing elements encouraging the user to choose an image to upload as their avatar and to enter a username and their own name.
  • An element is responsible for maintaining a preview of the chosen image by the user. By default, and if no image is chosen for upload, a default avatar image is used instead (making the image-upload effectively optional to the user in this scenario).
  • When a user selects an image to be uploaded, the upload to S3 is handled automatically and asynchronously with the process described earlier in this article. The image preview is then updated with the selected image once the upload is complete and successful.
  • The user is then free to move on to filling in the rest of the information.
  • The user then clicks the “submit” button, which posts the username, name and the URL of the uploaded image to the Python application to be checked and/or stored. If no image was uploaded by the user earlier the default avatar image URL is posted instead.

An example of what the simple finished product will consist of

Setting up the client-side code

This setup does not require any additional, non-standard Python libraries, but some scripts are necessary to complete the implementation on the client-side.

This article covers the use of the s3upload.js script. Obtain this script from the project’s repo (using Git or otherwise) and store it somewhere appropriate in your application’s static directory. This script currently depends on both the JQuery and Lo-Dash libraries. Inclusion of these in your application will be covered later on in this guide.

The HTML and JavaScript can now be created to handle the file selection, obtain the request and signature from your Python application, and then finally make the upload request.

Firstly, create a file called account.html in your application’s templates directory and populate the head and other necessary HTML tags appropriately for your application. In the body of this HTML file, include a file input and an element that will contain status updates on the upload progress. In addition to this, create a form to allow the user to enter their username and full name and a hidden input element to hold the URL of the chosen avatar image:

To see the completed HTML file, please see the appropriate code in the companion repository.

<input type="file" id="file" onchange="s3_upload();"/>
<p id="status">Please select a file</p>
<div id="preview"><img src="/static/default.png"  /></div>

<form method="POST" action="/submit_form/">
    <input type="hidden" id="avatar_url" name="avatar_url" value="/static/default.png" />
    <input type="text" name="username" placeholder="Username" /><br />
    <input type="text" name="full_name" placeholder="Full name" /><br /><br />
    <input type="submit" value="Update profile" />
</form>

The preview element initially holds a default avatar image (which would become the user’s avatar if a new image is not chosen), and the avatar_url input maintains the current URL of the user’s chosen avatar image. Both of these are updated by the JavaScript, discussed below, when the user selects a new avatar.

Thus when the user finally clicks the submit button, the URL of the avatar is submitted, along with the username and full name of the user, to your desired endpoint for server-side handling. The JavaScript method, s3_upload(), is called when a file is selected by the user. The creation and population of this method is covered below.

Next, include the three dependency scripts in your HTML file, account.html. You may need to adjust the src attribute for the file s3upload.js if you put this file in a directory other than /static:

<script type="text/javascript" src="http://code.jquery.com/jquery-1.9.1.js"></script>
<script type="text/javascript" src="https://raw.github.com/bestiejs/lodash/v1.1.1/dist/lodash.min.js"></script>
<script type="text/javascript" src="/static/s3upload.js"></script>

The ordering of the scripts is important as the dependencies need to be satisfied in this sequence. If you desire to host your own versions of JQuery and Lo-Dash, then adjust the src attribute accordingly.

Finally, in a <script> block, declare a JavaScript function, s3_upload(), in the same file again to process the file upload. This <script> block will need to exist below the inclusion of the three dependencies:

function s3_upload(){
    var s3upload = new S3Upload({
        file_dom_selector: 'file',
        s3_sign_put_url: '/sign_s3_upload/',

        onProgress: function(percent, message) {
            $('#status').html('Upload progress: ' + percent + '%' + message);
        },
        onFinishS3Put: function(url) {
            $('#status').html('Upload completed. Uploaded to: '+ url);
            $("#avatar_url").val(url);
            $("#preview").html('<img src="'+url+'" style="width:300px;" />');
        },
        onError: function(status) {
            $('#status').html('Upload error: ' + status);
        }
    });
}

This function creates a new instance of S3Upload, to which is passed the file input element, the URL from which to retrieve the signed request and three functions.

Initially, the function makes a request to the URL denoted by the s3_sign_put_url argument, passing the file name and mime type as GET parameters. The server-side code (covered in the next section) interprets the request and responds with a preview of the URL of the file to be uploaded to S3 and the signed request, which this function then uses to asynchronously upload the file to your bucket.

The function will post upload updates to the onProgress() function and , if the upload is successful, onFinishS3Put() is called and the URL returned by the Python application view is received as an argument. If, for any reason, the upload should fail, onError() will be called and the status parameter will describe the error.

If you find that the page isn’t working as you intend after implementing the system, then consider using console.log() to record any errors that occur inside the onError() callback and use your browser’s error console to help diagnose the problem.

If successful, the preview div will now be updated with the user’s chosen avatar image, and the hidden input field will contain the URL for the image. Now, once the user has completed the rest of the form and clicked submit, all three pieces of information can be posted to the same endpoint.

It is good practice to inform the user of any prolonged activity in any form of application (web- or device-based) and to display updates on changes. Thus the status methods could be used, for example, to show a loading GIF to indicate that an upload is in progress, which can then be hidden when the upload has finished. Without this sort of information, users may suspect that the page has crashed, and could try to refresh the page or otherwise disrupt the upload process.

Setting up the server-side Python code

This section discusses the use of Python for generating a temporary signature with which the upload request can be signed. This temporary signature uses the account details (the AWS access key and secret access key) as a basis for the signature, but users will not have direct access to this information. After the signature has expired, then upload requests with the same signature will not be successful.

As mentioned previously, this article covers the production of an application for the Flask framework, although the steps for other Python frameworks will be similar. Readers using Python 3 should consider the relevant information on Flask’s website before continuing.

To see the completed Python file, please see the appropriate code in the companion repository.

Start by creating your main application file, application.py, and set up your skeleton application appropriately:

from flask import Flask, render_template, request
from hashlib import sha1
import time, os, json, base64, hmac, urllib

app = Flask(__name__)

if __name__ == '__main__':
    port = int(os.environ.get('PORT', 5000))
    app.run(host='0.0.0.0', port=port)

The currently-unused import statements will be necessary later on.

Readers using Python 3 should import urllib.parse in place of urllib.

Next, in the same file, you will need to create the views responsible for returning the correct information back to the user’s browser when requests are made to various URLs. First define view for requests to /account to return the page account.html, which contains the form for the user to complete:

@app.route("/account/")
def account():
    return render_template('account.html')

Please note that the views for the application will need to be placed between the app = Flask(__name__) and if __name__ == '__main__': lines in application.py.

Now create the view, in the same Python file, that is responsible for generating and returning the signature with which the client-side JavaScript can upload the image. This is the first request made by the client before attempting an upload to S3. This view responds with requests to /sign_s3/:

@app.route('/sign_s3/')
def sign_s3():
    AWS_ACCESS_KEY = os.environ.get('AWS_ACCESS_KEY_ID')
    AWS_SECRET_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY')
    S3_BUCKET = os.environ.get('S3_BUCKET')

    object_name = request.args.get('s3_object_name')
    mime_type = request.args.get('s3_object_type')

    expires = int(time.time()+10)
    amz_headers = "x-amz-acl:public-read"

    put_request = "PUT\n\n%s\n%d\n%s\n/%s/%s" % (mime_type, expires, amz_headers, S3_BUCKET, object_name)

    signature = base64.encodestring(hmac.new(AWS_SECRET_KEY, put_request, sha1).digest())
    signature = urllib.quote_plus(signature.strip())

    url = 'https://%s.s3.amazonaws.com/%s' % (S3_BUCKET, object_name)

    return json.dumps({
        'signed_request': '%s?AWSAccessKeyId=%s&Expires=%d&Signature=%s' % (url, AWS_ACCESS_KEY, expires, signature),
         'url': url
      })

Readers using Python 3 should use urllib.parse.quote_plus() to quote the signature.

This code performs the following steps:

  • The request is received to /sign_s3/ and the AWS keys and S3 bucket name are loaded from the environment.
  • The name and mime type of the object to be uploaded are extracted from the GET parameters of the request (this stage may differ in other frameworks). The parameters are provided by the JavaScript discussed in the previous section.
  • The expiry time of the signature is set and forms the basis of the temporary nature of the signature. As shown, this is best used as a function relative to the current UNIX time. In this example, the signature will expire 10 seconds after Python has executed that line of code.
  • The headers line tells S3 what access permissions to grant. In this case, the object will be publicly available for download.
  • Now the PUT request can be constructed from the object information, headers and expiry time.
  • The signature is generated as an SHA hash of the compiled AWS secret key and the actual PUT request.
  • In addition, surrounding whitespace is stripped from the signature and special characters are escaped (using quote_plus) for safer transmission through HTTP.
  • The prospective URL of the object to be uploaded is produced as a combination of the S3 bucket name and the object name.
  • Finally, the signed request can be returned, along with the prospective URL, to the browser in JSON format.

You may wish to assign another, customised name to the object instead of using the one that the file is already named with, which is useful for preventing accidental overwrites in the S3 bucket. This name could be related to the ID of the user’s account, for example. If not, you should provide some method for properly quoting the name in case there are spaces or other awkward characters present. In addition, this is the stage at which you could provide checks on the uploaded file in order to restrict access to certain file types. For example, a simple check could be implemented to allow only .png files to proceed beyond this point.

It is sometimes possible for S3 to respond with 403 (forbidden) errors for requests which are signed by temporary signatures containing special characters. Therefore, it is important to appropriately quote the signature as demonstrated above.

Finally, in application.py, create the view responsible for receiving the account information after the user has uploaded an avatar, filled in the form, and clicked submit. Since this will be a POST request, this will also need to be defined as an ‘allowed access method’. This method will respond to requests to the URL /submit_form/:

@app.route("/submit_form/", methods=["POST"])
def submit_form():
    username = request.form["username"]
    full_name = request.form["full_name"]
    avatar_url = request.form["avatar_url"]
    update_account(username, full_name, avatar_url)
    return redirect(url_for('profile'))

In this example, an update_account() function has been called, but creation of this method is not covered in this article. In your application, you should provide some functionality, at this stage, to allow the app to store these account details in some form of database and correctly associate the information with the rest of the user’s account details.

In addition, the URL for the profile page has not been defined in this article (or companion code). Ideally, for example, after updating the account, the user would be redirected back to their own profile so that they can see the updated information.

Running the app

Everything should now be in place to perform the direct uploads to S3. To test the upload, save any changes and use foreman to start the application:

You will need a Procfile for this to be successful. See Getting Started with Python on Heroku for information on the Heroku toolbelt and using Foreman. Also remember to correctly set your environment variables on your own machine before running the application locally.

$ foreman start
15:44:36 web.1  | started with pid 12417

Press Ctl-C to return to the prompt. If your application is returning 500 errors (or other server-based issues), then start your server in debug mode and view the output in the Terminal emulator to help fix your problem. For example, in Flask:

...
app.debug = True
port = int(os.environ.get('PORT', 5000))
app.run(host='0.0.0.0', port=port)

If you are receiving 403 errors back from S3, then the most common reason is that there is an issue with your signature. As mentioned earlier, you should consider quoting the signature properly and removing any whitespace.

Summary

This article covers uploading to Amazon S3 directly from the browser using Python to temporarily sign the upload request. Although the guide and companion code focuses on the Flask framework, the idea should easily carry over to other Python applications.