Adding a staging environment to Rails

A staging environment is meant to track as closely as possible the production environment in order for your app to be entirely tested under the condition of production. Let’s dig into this.

It must be a perfect isolated copy of production, including its data. Everything that happen in production should happen in staging.

Why having a staging environment ?

A staging environment is a perfect place to battle test your code under the conditions of production, but with the advantage that you can crash it, hard. Very useful, especially when you handle a big code migration, such as a Rails version update, a change of background processing system, refactoring, new features…

You’ll be able to test your deployment process. You’ll be able to run your ActiveRecord migrations against production data, to see how much time it would take, and how your app behaves while.

It can be a place where you elaborate and test your software updates. You want to upgrade your Ruby version from 1.9.3 to 2.1.3, you can see the full impact on your app. You can elaborate the steps needed to upgrade the production environment, and test them, again and again until everything is smooth. Same thing for every piece of your stack: ElasticSearch… And moreover, outside of software updates, you can test what happens if your stop your ElasticSearch cluster, kill a MongoDB replica set…

It also allows you to launch a benchmark suite hitting your full stack, from Nginx, Passenger / Unicorn… to MySQL / MongoDB, ElasticSearch…

En bref, you deploy your branch into staging, and you’re more peaceful when you merge it into master and deploy it into production.

Where to place it ?

In an ideal world, you have a separate infrastructure for staging. But you can have one or a couple of dedicated servers / VMs, with everything on them. That’s more than enough.

You can setup staging alongside production, but it’s less safe with every piece shared between both staging and production. Make sure not to overload your DB server with staging requests, that could kill performance of your production ! And dealing with your softwares’ version would be quite tight, if you plan to diverge for testing. It’s enough if you only plan to battle test your code, but have the limitation in mind.

The URL

You have to choose an URL. Here are some examples if your app domain is myapphq.com:

staging.myapphq.com
myapphq.net
myapphq.info
myapphq.com:4242
myapphq-staging.com

And if you use subdomains to handle multi-tenancy, considering customer1.myapphq.com:

customer1.staging.myapphq.com
customer1.myapphq.net
customer1.myapphq.info
customer1.myapphq.com:4242
customer1.myapphq-staging.com

Don’t forget about SSL. If you have SSL in production, you should have SSL in staging ! You might need an extra SSL certificate, depending on the URLs you choose. Have in mind that wildcard certificate on *.domain.com works for bar.domain.com, but doesn’t work for foo.bar.domain.com; you need a wildcard certificate for *.bar.domain.com.

Let’s start

The first step is to duplicate config/environments/production.rb into config/environments/staging.rb and adapt it.

Look into all your config/*.yml files to add a staging environment: database.yml… I strongly advise you to have different credentials and namespaces for staging. It can spare you an “I deleted some (or worst all) production data” when working on staging. This is true for the database, Amazon S3… For Redis, consider using a different database (/1 instead of /0) and a namespace including the environment for the extra safety.

If you use Capistrano 3, multi-stage is included. For Capistrano 2, there is a gem to add. Duplicate config/deploy/production.rb into config/deploy/staging.rb and adapt it. It may be the occasion to move stage specific settings from config/deploy.rb to the stage files.

Finally, search your code for Rails.env.production? and more generally production.

The data

The idea is to have the production data loaded into your staging environment.

Create a lib/tasks/dump.rake file:

require "erb"
require "yaml"

namespace :dump do
  desc "Fails if FILE doesn’t exists"
  task :barrier do
    file = ENV["FILE"]
    raise "Need a FILE" unless file

    File.exists?(file) or raise "No file found (path given by FILE)"
  end

  desc "Retrieve the dump file"
  task :retrieve do
    remote = ENV["REMOTE"]
    raise "Need a REMOTE file" unless remote
    file = ENV["FILE"]
    raise "Need a FILE" unless file

    # here you copy remote into file
    # via scp (prefered) or http GET
    do_whatever_is_needed or raise "Can’t retrieve #{remote}"
  end

  desc "Export the database"
  task :export do
    file = ENV["FILE"]
    raise "Need a FILE" unless file

    env = ENV["RAILS_ENV"]
    raise "Need a RAILS_ENV" unless env

    db_config = current_db_config(env)
    system "#{mysqldump(db_config)} | gzip -c > #{file}"
  end

  desc "Import a database"
  task :import => :barrier do
    file = ENV["FILE"]
    raise "Need a FILE" unless file

    env = ENV["RAILS_ENV"]
    raise "Need a RAILS_ENV" unless env
    raise "Import on production is forbidden" if env == "production"

    db_config = current_db_config(env)
    system "gzip -d -c #{file} | #{mysql(db_config)}"
  end

  def current_db_config(env)
    YAML::load(ERB.new(IO.read(File.join(File.dirname(__FILE__), "../../config/database.yml"))).result)[env]
  end

  def mysql(config)
    sql_cmd("mysql", config)
  end

  def mysqldump(config)
    sql_cmd("mysqldump", config) + " --add-drop-table --extended-insert=TRUE --disable-keys --complete-insert=FALSE --triggers=FALSE"
  end

  def sql_cmd(sql_command, config)
    "".tap do |cmd|
      cmd << sql_command
      cmd << " "
      cmd << "-u#{config["username"]} " if config["username"]
      cmd << "-p#{config["password"]} " if config["password"]
      cmd << "-h#{config["host"]} " if config["host"]
      cmd << "-P#{config["port"]} " if config["port"]
      cmd << "--default-character-set utf8 "
      cmd << config["database"] if config["database"]
    end
  end
end

Here is how to use it:

# on production
RAILS_ENV=production FILE=/tmp/dump_production.sql.gz bin/rake dump:export
# on staging
RAILS_ENV=staging REMOTE=app.myapp.com:/tmp/dump_production.sql.gz FILE=/tmp/dump_production.sql.gz bin/rake dump:retrieve dump:barrier maintenance:enable db:drop db:create dump:import db:migrate maintenance:restart maintenance:disable

The export process is simply :

dump:export simply use mysqldump to export the data.

The full import process is :

dump:retrieve gets the dump file from production (not necessary if same server, or if the dump is stored on a shared file system).
dump:barrier fails if there is no file to import. This avoids dropping the database if we can’t import right after.
maintenance:enable starts the maintenance mode.
db:drop drops the current staging database.
db:create creates a fresh new empty staging database.
dump:import does import the production data.
db:migrate runs the needed migrations. Your code deployed in staging may be in advance from the production code: there may be some migrations not applied since we just imported the production database.
maintenance:restart restarts the app.
maintenance:disable ends the maintenance mode.

You noticed the maintenance:enable and maintenance:disable tasks. I like to add them to prevent access to the staging app, because during the import, weird things can happen as your data is not fully loaded yet, tables are missing… Here is the lib/tasks/maintenance.rake file:

namespace :maintenance do
  MAINTENANCE_FILE = Rails.root.join("public/system/maintenance.html")
  RESTART_FILE = Rails.root.join("tmp/restart")

  desc "Start the maintenance mode"
  task :enable => :environment do
    if !File.exists?(MAINTENANCE_FILE)
      dir = File.dirname(MAINTENANCE_FILE)
      system "mkdir -p #{dir} && echo \"Website is on maintenance. We’ll be back in a few seconds...\" > #{MAINTENANCE_FILE}"
      Rails.logger.info("[MAINTENANCE] App is now DOWN")
    end
  end

  desc "Stop the maintenance mode"
  task :disable => :environment do
    if File.exists?(MAINTENANCE_FILE)
      if File.unlink(MAINTENANCE_FILE) == 1
        Rails.logger.info("[MAINTENANCE] App is now UP")
      end
    end
  end

  desc "Restart the application"
  task :restart => :environment do
    FileUtils.touch(RESTART_FILE)
    Rails.logger.info("[MAINTENANCE] App has restarted")
  end

end

And here is the part to add into the server part of your app in your nginx.conf:

error_page 503 @maintenance;
location @maintenance {
  try_files /503.html /system/maintenance.html 503;
}

if (-f $document_root/system/maintenance.html) {
  return 503;
}

The maintenance thing is pretty standard. Feel free to adapt or to use a dedicated gem (for Capistrano 3 for example).

There are other thing to consider. If you have attachments, you have to take care of them. If they are stored locally, you’ll have to export them from production to import them into staging. If they are stored on Amazon S3, you can copy them as well every night from the production bucket to the staging bucket. Or you can setup an advanced workflow: all PUT, DELETE… hit the staging bucket (allowing them to fail, especially for the DELETE), but for the GET requests, it firstly tries the staging bucket, and if nothing is found, it falls back to getting the object from the production bucket. It spares you the copy step, and the extra cost of storage.

Magic happens at night

Time to automate things: every night, the staging gets a fresh start via a cron.

I came to like the whenever gem to handle the app’s crons. Refer to the gem documentation for how to setup a multi-stage configuration (with Capistrano). Here is an example of config:

set :production_dump_file, "/tmp/production_dump_daily.sql.gz"
set :remote_dump_file, "dumper@app.mydomain.com:/tmp/production_dump_daily.sql.gz"
set :staging_dump_file, "/tmp/production_dump_daily.sql.gz"

if environment == "production" || environment == "staging"
  # normal crons goes here
end

if environment == "production"
  # Export daily dump
  every :day, at: "12:00am" do
    command "cd #{path} && #{environment_variable}=#{environment} FILE=#{production_dump_file} bin/rake dump:export"
  end
end

if environment == "staging"
  # Import daily dump
  every :day, at: "12:30am" do
    command "cd #{path} && #{environment_variable}=#{environment} REMOTE=#{remote_dump_file} FILE=#{staging_dump_file} bin/rake dump:retrieve dump:barrier maintenance:enable db:drop db:create dump:import db:migrate maintenance:restart maintenance:disable"
  end
end

The export is done on production at 12:00am, and the import happens at 12:30am. Adjust the import time according to the time taken by the export and the transfer, plus an extra margin. It depends on your amount of data. Don’t forget to run this outside any heavy processing period if any (like a statistics processing tasks or a machine learning model update…).

The outside world

Mails

Let’s begin with your mailers. You can’t have your users to receive mails from the staging environment. Imagine that you try to answer to something and they get a notification about it.

If you don’t have SMTP servers, and you don’t want to use your mail provider services, create a dedicated Gmail account, and configure it in your config/environments/staging.rb file:

# Email configuration
config.action_mailer.delivery_method = :smtp
config.action_mailer.smtp_settings = {
  address: "smtp.gmail.com",
  port: 587,
  domain: "gmail.com",
  authentication: "login",
  enable_starttls_auto: true,
  user_name: "my-app-staging@gmail.com",
  password: "my-password"
}

Now, let’s hijack the mailers. The purpose is to send all the mails into a dedicated mail account. It could be the above Gmail account, or a mailing list or a private Google group that your developers subscribe to.

We’ll use the sanitize_email gem. Add it to your Gemfile and configure it in your config/environments/staging.rb file:

STAGING_EMAIL = "my-app-staging@gmail.com"
SanitizeEmail::Config.configure do |config|
  config[:sanitized_to] = STAGING_EMAIL.gsub("@", "+to@")
  config[:sanitized_cc] = STAGING_EMAIL.gsub("@", "+cc@")
  config[:sanitized_bcc] = STAGING_EMAIL.gsub("@", "+bcc@")
  config[:use_actual_email_prepended_to_subject] = true
  config[:use_actual_environment_prepended_to_subject] = true
  config[:use_actual_email_as_sanitized_user_name] = true
  config[:activation_proc] = Proc.new { Rails.env.staging? }
end

Note that this can also be applied to development, with or without a dedicated gmail account, but with the refinement that you can send the emails to the current git user: git_user = `git config user.email`.squish

Other stuff

You have to think about the interactions with the outside world: the external APIs, the different providers (including your subscription provider…). Especially when thinking about the related data that come from production: you may have tokens, client_ids… Are they valid in production ? Can it be used ? Should you use those APIs in staging (if you cancel a subscription from staging, uhoh or okay ?)

For the Social Connect (Facebook, Twitter, Google+, Github…), it’s okay, just add the callback URLs of your staging env in their developers console, along side with the production ones. But for using their APIs, it must be more thoroughly examined. The stored tokens are valid, so if you publish on someone’s wall from staging, it will work. Do you have to have a different app_id ? You have probably answered to all those kind of question for your development environment.

Final note

The staging environment is not just a tool, it should be a must stop for your code before entering in production. As a serious piece of your workflow and infrastructure, you have to monitor it, have exceptions be sent / tracked. Any event happening in the staging environment is a forerunner announcement of an issue in production. Better ruin the staging env than being sorry in prod.

And what about you ? How did you set up such an environment ?