A staging environment is meant to track as closely as possible the production environment in order for your app to be entirely tested under the condition of production. Let’s dig into this.
It must be a perfect isolated copy of production, including its data. Everything that happen in production should happen in staging.
Why having a staging environment ?
A staging environment is a perfect place to battle test your code under the conditions of production, but with the advantage that you can crash it, hard. Very useful, especially when you handle a big code migration, such as a Rails version update, a change of background processing system, refactoring, new features…
You’ll be able to test your deployment process. You’ll be able to run your ActiveRecord migrations against production data, to see how much time it would take, and how your app behaves while.
It can be a place where you elaborate and test your software updates. You want to upgrade your Ruby version from 1.9.3 to 2.1.3, you can see the full impact on your app. You can elaborate the steps needed to upgrade the production environment, and test them, again and again until everything is smooth. Same thing for every piece of your stack: ElasticSearch… And moreover, outside of software updates, you can test what happens if your stop your ElasticSearch cluster, kill a MongoDB replica set…
It also allows you to launch a benchmark suite hitting your full stack, from Nginx, Passenger / Unicorn… to MySQL / MongoDB, ElasticSearch…
En bref, you deploy your branch into staging, and you’re more peaceful when you merge it into master and deploy it into production.
Where to place it ?
In an ideal world, you have a separate infrastructure for staging. But you can have one or a couple of dedicated servers / VMs, with everything on them. That’s more than enough.
You can setup staging alongside production, but it’s less safe with every piece shared between both staging and production. Make sure not to overload your DB server with staging requests, that could kill performance of your production ! And dealing with your softwares’ version would be quite tight, if you plan to diverge for testing. It’s enough if you only plan to battle test your code, but have the limitation in mind.
The URL
You have to choose an URL. Here are some examples if your app domain is myapphq.com
:
staging.myapphq.com
myapphq.net
myapphq.info
myapphq.com:4242
myapphq-staging.com
And if you use subdomains to handle multi-tenancy, considering customer1.myapphq.com
:
customer1.staging.myapphq.com
customer1.myapphq.net
customer1.myapphq.info
customer1.myapphq.com:4242
customer1.myapphq-staging.com
Don’t forget about SSL. If you have SSL in production, you should have SSL in staging ! You might need an extra SSL certificate, depending on the URLs you choose. Have in mind that wildcard certificate on *.domain.com
works for bar.domain.com
, but doesn’t work for foo.bar.domain.com
; you need a wildcard certificate for *.bar.domain.com
.
Let’s start
The first step is to duplicate config/environments/production.rb
into config/environments/staging.rb
and adapt it.
Look into all your config/*.yml
files to add a staging environment: database.yml
… I strongly advise you to have different credentials and namespaces for staging. It can spare you an “I deleted some (or worst all) production data” when working on staging. This is true for the database, Amazon S3… For Redis, consider using a different database (/1
instead of /0
) and a namespace including the environment for the extra safety.
If you use Capistrano 3, multi-stage is included. For Capistrano 2, there is a gem to add. Duplicate config/deploy/production.rb
into config/deploy/staging.rb
and adapt it. It may be the occasion to move stage specific settings from config/deploy.rb
to the stage files.
Finally, search your code for Rails.env.production?
and more generally production
.
The data
The idea is to have the production data loaded into your staging environment.
Create a lib/tasks/dump.rake
file:
require "erb"
require "yaml"
namespace :dump do
desc "Fails if FILE doesn’t exists"
task :barrier do
file = ENV["FILE"]
raise "Need a FILE" unless file
File.exists?(file) or raise "No file found (path given by FILE)"
end
desc "Retrieve the dump file"
task :retrieve do
remote = ENV["REMOTE"]
raise "Need a REMOTE file" unless remote
file = ENV["FILE"]
raise "Need a FILE" unless file
# here you copy remote into file
# via scp (prefered) or http GET
do_whatever_is_needed or raise "Can’t retrieve #{remote}"
end
desc "Export the database"
task :export do
file = ENV["FILE"]
raise "Need a FILE" unless file
env = ENV["RAILS_ENV"]
raise "Need a RAILS_ENV" unless env
db_config = current_db_config(env)
system "#{mysqldump(db_config)} | gzip -c > #{file}"
end
desc "Import a database"
task :import => :barrier do
file = ENV["FILE"]
raise "Need a FILE" unless file
env = ENV["RAILS_ENV"]
raise "Need a RAILS_ENV" unless env
raise "Import on production is forbidden" if env == "production"
db_config = current_db_config(env)
system "gzip -d -c #{file} | #{mysql(db_config)}"
end
def current_db_config(env)
YAML::load(ERB.new(IO.read(File.join(File.dirname(__FILE__), "../../config/database.yml"))).result)[env]
end
def mysql(config)
sql_cmd("mysql", config)
end
def mysqldump(config)
sql_cmd("mysqldump", config) + " --add-drop-table --extended-insert=TRUE --disable-keys --complete-insert=FALSE --triggers=FALSE"
end
def sql_cmd(sql_command, config)
"".tap do |cmd|
cmd << sql_command
cmd << " "
cmd << "-u#{config["username"]} " if config["username"]
cmd << "-p#{config["password"]} " if config["password"]
cmd << "-h#{config["host"]} " if config["host"]
cmd << "-P#{config["port"]} " if config["port"]
cmd << "--default-character-set utf8 "
cmd << config["database"] if config["database"]
end
end
end
Here is how to use it:
# on production
RAILS_ENV=production FILE=/tmp/dump_production.sql.gz bin/rake dump:export
# on staging
RAILS_ENV=staging REMOTE=app.myapp.com:/tmp/dump_production.sql.gz FILE=/tmp/dump_production.sql.gz bin/rake dump:retrieve dump:barrier maintenance:enable db:drop db:create dump:import db:migrate maintenance:restart maintenance:disable
The export process is simply :
dump:export
simply use mysqldump to export the data.
The full import process is :
dump:retrieve
gets the dump file from production (not necessary if same server, or if the dump is stored on a shared file system).dump:barrier
fails if there is no file to import. This avoids dropping the database if we can’t import right after.maintenance:enable
starts the maintenance mode.db:drop
drops the current staging database.db:create
creates a fresh new empty staging database.dump:import
does import the production data.db:migrate
runs the needed migrations. Your code deployed in staging may be in advance from the production code: there may be some migrations not applied since we just imported the production database.maintenance:restart
restarts the app.maintenance:disable
ends the maintenance mode.
You noticed the maintenance:enable
and maintenance:disable
tasks. I like to add them to prevent access to the staging app, because during the import, weird things can happen as your data is not fully loaded yet, tables are missing… Here is the lib/tasks/maintenance.rake
file:
namespace :maintenance do
MAINTENANCE_FILE = Rails.root.join("public/system/maintenance.html")
RESTART_FILE = Rails.root.join("tmp/restart")
desc "Start the maintenance mode"
task :enable => :environment do
if !File.exists?(MAINTENANCE_FILE)
dir = File.dirname(MAINTENANCE_FILE)
system "mkdir -p #{dir} && echo \"Website is on maintenance. We’ll be back in a few seconds...\" > #{MAINTENANCE_FILE}"
Rails.logger.info("[MAINTENANCE] App is now DOWN")
end
end
desc "Stop the maintenance mode"
task :disable => :environment do
if File.exists?(MAINTENANCE_FILE)
if File.unlink(MAINTENANCE_FILE) == 1
Rails.logger.info("[MAINTENANCE] App is now UP")
end
end
end
desc "Restart the application"
task :restart => :environment do
FileUtils.touch(RESTART_FILE)
Rails.logger.info("[MAINTENANCE] App has restarted")
end
end
And here is the part to add into the server
part of your app in your nginx.conf
:
error_page 503 @maintenance;
location @maintenance {
try_files /503.html /system/maintenance.html 503;
}
if (-f $document_root/system/maintenance.html) {
return 503;
}
The maintenance thing is pretty standard. Feel free to adapt or to use a dedicated gem (for Capistrano 3 for example).
There are other thing to consider. If you have attachments, you have to take care of them. If they are stored locally, you’ll have to export them from production to import them into staging. If they are stored on Amazon S3, you can copy them as well every night from the production bucket to the staging bucket. Or you can setup an advanced workflow: all PUT, DELETE… hit the staging bucket (allowing them to fail, especially for the DELETE), but for the GET requests, it firstly tries the staging bucket, and if nothing is found, it falls back to getting the object from the production bucket. It spares you the copy step, and the extra cost of storage.
Magic happens at night
Time to automate things: every night, the staging gets a fresh start via a cron.
I came to like the whenever gem to handle the app’s crons. Refer to the gem documentation for how to setup a multi-stage configuration (with Capistrano). Here is an example of config:
set :production_dump_file, "/tmp/production_dump_daily.sql.gz"
set :remote_dump_file, "dumper@app.mydomain.com:/tmp/production_dump_daily.sql.gz"
set :staging_dump_file, "/tmp/production_dump_daily.sql.gz"
if environment == "production" || environment == "staging"
# normal crons goes here
end
if environment == "production"
# Export daily dump
every :day, at: "12:00am" do
command "cd #{path} && #{environment_variable}=#{environment} FILE=#{production_dump_file} bin/rake dump:export"
end
end
if environment == "staging"
# Import daily dump
every :day, at: "12:30am" do
command "cd #{path} && #{environment_variable}=#{environment} REMOTE=#{remote_dump_file} FILE=#{staging_dump_file} bin/rake dump:retrieve dump:barrier maintenance:enable db:drop db:create dump:import db:migrate maintenance:restart maintenance:disable"
end
end
The export is done on production at 12:00am, and the import happens at 12:30am. Adjust the import time according to the time taken by the export and the transfer, plus an extra margin. It depends on your amount of data. Don’t forget to run this outside any heavy processing period if any (like a statistics processing tasks or a machine learning model update…).
The outside world
Mails
Let’s begin with your mailers. You can’t have your users to receive mails from the staging environment. Imagine that you try to answer to something and they get a notification about it.
If you don’t have SMTP servers, and you don’t want to use your mail provider services, create a dedicated Gmail account, and configure it in your config/environments/staging.rb
file:
# Email configuration
config.action_mailer.delivery_method = :smtp
config.action_mailer.smtp_settings = {
address: "smtp.gmail.com",
port: 587,
domain: "gmail.com",
authentication: "login",
enable_starttls_auto: true,
user_name: "my-app-staging@gmail.com",
password: "my-password"
}
Now, let’s hijack the mailers. The purpose is to send all the mails into a dedicated mail account. It could be the above Gmail account, or a mailing list or a private Google group that your developers subscribe to.
We’ll use the sanitize_email gem. Add it to your Gemfile
and configure it in your config/environments/staging.rb
file:
STAGING_EMAIL = "my-app-staging@gmail.com"
SanitizeEmail::Config.configure do |config|
config[:sanitized_to] = STAGING_EMAIL.gsub("@", "+to@")
config[:sanitized_cc] = STAGING_EMAIL.gsub("@", "+cc@")
config[:sanitized_bcc] = STAGING_EMAIL.gsub("@", "+bcc@")
config[:use_actual_email_prepended_to_subject] = true
config[:use_actual_environment_prepended_to_subject] = true
config[:use_actual_email_as_sanitized_user_name] = true
config[:activation_proc] = Proc.new { Rails.env.staging? }
end
Note that this can also be applied to development, with or without a dedicated gmail account, but with the refinement that you can send the emails to the current git user: git_user = `git config user.email`.squish
Other stuff
You have to think about the interactions with the outside world: the external APIs, the different providers (including your subscription provider…). Especially when thinking about the related data that come from production: you may have tokens, client_ids… Are they valid in production ? Can it be used ? Should you use those APIs in staging (if you cancel a subscription from staging, uhoh or okay ?)
For the Social Connect (Facebook, Twitter, Google+, Github…), it’s okay, just add the callback URLs of your staging env in their developers console, along side with the production ones. But for using their APIs, it must be more thoroughly examined. The stored tokens are valid, so if you publish on someone’s wall from staging, it will work. Do you have to have a different app_id ? You have probably answered to all those kind of question for your development environment.
Final note
The staging environment is not just a tool, it should be a must stop for your code before entering in production. As a serious piece of your workflow and infrastructure, you have to monitor it, have exceptions be sent / tracked. Any event happening in the staging environment is a forerunner announcement of an issue in production. Better ruin the staging env than being sorry in prod.
And what about you ? How did you set up such an environment ?