Gratuitous development with AWS S3 and Paperclip

By Bartosz Żurkowski, 17 Aug 2015

In the past few years, online storage has become mainstream, providing highly-scalable architectures, enhanced security and wide data accessibility. One of the most popular storage services is Amazons’s Simple Storage Service, popularly known as S3.

Amazon S3 is an extremely powerful service at the core of Amazon Web Services. However, outside of the production environment, S3 can be challenging to work with. It involves passing access keys around, provisioning user accounts, and maintaining a reliable network connection - not to mention it costs money.

Luckily, there exists a tool that helps to solve this problem. FakeS3 is a lightweight server that simulates behaviour of the real S3. It responds to the same calls Amazon S3 responds to and stores uploaded files in your local filesystem - no requests made to Amazon’s service. Although gem doesn’t support the full set of S3 commands, the implemented API is sufficient for most application’s use cases.

image

In this article I’m going to present the approach of integrating AWS and FakeS3 with Paperclip - popular file attachment library for Active Record. Paperclip and S3 mixed together provide effective file storage system that combines useful Paperclip’s core features (like validations management and image transformations) with advantages of online storage. Although configuration of these tools isn’t obvious and requires digging into detailed documentation as well as resolving many gem-specific issues, it’s worth spending some time making development faster and more efficient.

What is our goal?

Integration of described tools requires three steps:

  1. Launching S3 fakeserver provided by FakeS3 gem in the background.
  2. Configuring AWS S3 client to delegate all requests to launched fakeserver.
  3. Configuring Paperclip to use fake S3 endpoint in built resource’s URLs.

Installation

Let’s start by installing required gems:

1
2
3
4
5
6
# Gemfile

gem "paperclip"
gem "aws-sdk", "~> 1.6"

gem "fakes3", group: [:development, :test]

Make sure to install version 1.6 of aws-sdk. Paperclip which uses SKD to manage storage in Amazon’s service doesn’t work well with higher versions of this gem. This is due to significant changes in SDK’s API brought with version 2.0.

Also remember that the main goal of FakeS3 is to minimize runtime dependencies. It is more of a development tool to test S3 calls in your code rather than a production server looking to duplicate S3 functionality. Therefore you should include gem only in development and test group.

AWS configuration

AWS SDK provides a dedicated helper method responsible for loading configuration. It will by default load configuration from config/aws.yml, extract it’s parameters for current environment and pass them to AWS client. Firstly, call the following method in an initializer:

1
2
3
# config/initializers/aws.rb

AWS::Rails.load_yaml_config

Now, as we have configuration file being properly loaded we can proceed with specifying it’s content:

1
2
3
4
5
6
7
8
9
10
11
#  config/aws.yml

development: &development
    access_key_id:       "abc"
    secret_access_key:   "abc"
    s3_endpoint:         "localhost"
    s3_port:             10001
    s3_force_path_style: true
    use_ssl:             false

test: *development

Let’s discuss all parameters one by one:

  • access_key_id, secret_access_key - AWS client credentials required to gain access to your Amazon’s account. They are ignored by fake S3 server hence custom values in sandbox environments.
  • s3_endpoint, s3_port- S3 endpoint specification. We use these parameters to replace real S3 endpoint with fake endpoint launched by FakeS3 gem - all requests to Amazon’s service will be now delegated to local fakeserver.
  • s3_force_path_style - S3 accepts two styles of including bucket’s name in URL. You can choose to have bucket’s name placed domain-style (bucket.s3.amazonaws.com) or path-style (s3.amazonaws.com/bucket). In order to keep things simple and avoid extra configuration associated with mapping bucket’s subdomain to loopback address I prefer path-style over domain-style in development environment.
  • use_ssl - enforces AWS SDK to use HTTPS instead of vanilla HTTP. We need to disable this option, because FakeS3 gem doesn’t support HTTPS requests which AWS client performs by default.

Configuration for production environment is pretty straightforward:

1
2
3
4
5
6
7
# config/aws.yml

production: &production
    access_key_id:     <%= ENV["AWS_ACCESS_KEY_ID"] %>
    secret_access_key: <%= ENV["AWS_SECRET_ACCESS_KEY"] %>

staging: *production

This time, however, we are dealing with real S3 service, therefore you need to provide authentic AWS credentials.

Due to potential security risks it’s a good practice to keep secret values like access keys out of your version control system eg. by using environment variables. We’ll use ERB to inject it’s values into configuration file.

Paperclip configuration

Now it’s time to face Paperclip and force it to work nicely with the already configured S3 client. The main goal of Paperclip’s configuration is to obtain the storage path that will locate resources hosted by fakeserver:

1
localhost:10001/:bucket_name/:path

Again, let’s start with development environment:

1
2
3
4
5
6
7
8
9
10
11
# config/paperclip.yml

development: &development
    storage:       :s3
    bucket:        "development"
    s3_host_name:  "localhost"
    url:           ":s3_alias_url"
    path:          ":class/:attachment/:id_partition/:style/:filename.:extension"
    s3_host_alias: "localhost:10001/development"

test: *development
  • storage - specfies storage carrier (by default local filesystem). Since we’re using AWS S3 we need to change it to :s3.
  • bucket - name of the S3 bucket that will store your files. If the bucket doesn’t exist Paperclip will attempt to create it.
  • url - set to :s3_alias_url will cause Paperclip to alias S3 bucket’s host name with value specified by :s3_host_alias parameter.
  • s3_host_alias - alias for default S3 bucket’s host. Notice that host, port and bucket’s name placement correspond to configuration of AWS client.
  • path - pattern for keys under which the files will be stored in bucket. Keys should be unique within bucket, like filenames. Due to the fact that S3 doesn’t support directories, you can use a / symbol to simulate directory structures.
1
2
3
4
5
6
7
8
9
# config/paperclip.yml

production: &production
    storage: :s3
    bucket:  <%= ENV["S3_BUCKET_NAME"] %>
    url:     ":s3_domain_url"
    path:    ":class/:attachment/:id_partition/:style/:filename.:extension"

staging: *production

Similarly to AWS credentials, bucket name is also considered to be a secret value which should be stored out of your code base. I recommend storing its name in environment variable.

Lastly, merge configuration into Paperclip’s default options in an initializer:

1
2
3
4
5
6
# config/initializers/paperclip.rb

paperclip_defaults = Rails.application.config_for :paperclip
paperclip_defaults.symbolize_keys!

Paperclip::Attachment.default_options.merge! paperclip_defaults

Running fakes3

Both AWS and Paperclip configurations contain a reference to the local S3 fakeserver that is expected to run under localhost:10001. Before working in development you should launch server with the following command (provided by FakeS3 gem):

1
fakes3 -r public/system -p 10001

Passed parameters are:

  • root -r - root directory under which uploaded files will be stored. Consider excluding it from VCS if you don’t want to have uploaded files being stored in your repository.
  • port -p - number of port on which local server will be run.

If you’re using Foreman for process management in your application, it may be convenient to add the following entry into Procfile:

1
2
3
# Procfile

fakes3: fakes3 -r ${FAKES3_STORAGE_PATH:-public/system} -p ${FAKES3_PORT:-10001}

This will save you time lost on launching fakeserver every time you’ll need to develop some S3-related features.

Conclusion

We’ve configuered AWS client to delegate all requests to local fakeserver, setup Paperclip to use fake S3 endpoint in built resource’s URLs and launched fakeserver provided by Fake S3 gem storing all files in local filesystem.

As a result, we became independent from Internet connection and saved money making our development faster and more reliable.

About the author

Bartosz Żurkowski — Meticulous Astronaut

Bartosz pays attention to what he is doing. He needs a little bit time to become a life of the party but when you know him better you appreciate his wide interests.

comments powered by Disqus