Destroy Your Development Environment
Many development environments require a fair amount of manual setup. It takes a while and is not the sort of thing you’d want to do every day. This type of setup has caused me a great deal of pain over the years. Teams I have worked with have put in varying degrees of effort to ensure their documentation is accurate. Some have even automated part or even all of the setup. But things always drift. The scripts stop working. The documentation becomes inaccurate. And by the time a new developer comes on board, it needs quite a bit of updating.
I believe the root of the problem is the frequency in which you destroy and recreate your development environment. I propose that an ideal setup includes a script that not only sets up a new environment, but also destroys and recreates an existing environment. Furthermore, for existing environments, it should take no more than a minute to run and should be run at least daily.
Why?
When you don’t destroy your development environment frequently, the following problems tend to happen:
- It feels acceptable to require manual steps to setup or update an environment. This makes onboarding for new developers hard and possibly confusing. It also makes it hard for existing developers to get a new computer.
- Confidence that setup scripts work and instructions are accurate becomes low since they probably haven’t been used in a while.
- Manually created state tends to creep into your data stores. The state of each machine drifts. You find yourself needing a certain dev machine to work on a certain task because it has the state you need. Developer onboarding becomes even more painful because of this.
- John makes a change that breaks Susan’s environment and either forgot to tell her or she forgot what John said. Susan spends the better part of a day tracking down what is wrong.
These costs affect all developers on a team daily. The compounding cost is huge.
Initial Cost
Destroying your development environment frequently is basically a forcing function that will slow you down initially in the following ways:
- You are forced to fully automate everything.
- You are forced to make the script that rebuilds your environment be quite fast. If it is slow, someone will be annoyed by it pretty quickly and figure out a way to make it faster.
- You are forced to create important state as a part of your environment bootstrapping because otherwise it will be gone later that afternoon.
Benefits
These costs are far less than the costs of the pain points they address. If your development environment is fully automated such that you can (and do) destroy it daily, here is what you can expect:
- Whenever another developer makes a change that breaks your environment, you run the setup script before even bothering to check why it is broken. Unless there’s an actual bug, this solves your problem every time in less than a minute. In fact, you get in the habit of proactively running the script each time you begin work so you never run into problems in the first place.
- New developers can get coding and experimenting in their own local environment right away. They are happy and feel in control of their own learning and onboarding. The team is confident that the setup for this new developer will work because they all ran the setup very recently.
- You switch computers freely and painlessly when advantageous. You feel free to get a brand-new laptop with no worry about setting it up.
How To Do It
There are many good ways to set up your environment. Destroying your development environment frequently will naturally force you to work through your own unique problems and find a suitable solution. However, I find concrete examples helpful. Here’s how I put mine together.
Make It Self-Explanatory
These days, a good user-interface doesn’t require you to read documentation. It is self-explanatory. In the world of software development, documentation is still necessary, but we should make things self-explanatory where we can. I find value in telling a new developer how to “turn on” the project as quickly as possible and let them read more details later.
As a developer, I expect to find out how to get started by looking at the README file. Ideally, in the first few lines of that README I will be pointed to a script that makes everything turn on. It should install dependencies, apply database schema, insert seed data, turn on services and whatever else is needed to go from blank slate to fully operational.
Automated Setup and Rebuild
I generally prefer to have a single script that works for a new environment as well as refreshing an existing environment. The design goals for these two scenarios are different so let’s examine them.
1. New Environment
In this scenario, we are getting the project running for the first time in a given environment. The only design goal is that it works every time. Don’t assume anything. Don’t assume they have Postgres installed. Don’t assume some directory exists with the right permissions. Don’t assume XCode is installed. Check everything and, whenever possible, take care of it automatically.
2. Existing Environment
In this scenario, a developer already has the project up and running and they want to refresh it. We have two design goals for this scenario:
- Destroy everything. For example, don’t just update the database schema. Completely destroy and re-create the database.
- Make it fast. In my experience ~1 minute is a good maximum time.
So far, making it fast enough has proven to a reasonable goal. The slowest part of setup is usually downloading artifacts (NPM packages, Docker images, etc.). Docker images are cached automatically. NPM packages are cached well as long as you do it right. For example, yarn install --frozen-lockfile
does little or no caching and therefore I use yarn
in development instead which is extremely good at caching.
Configuration
I like my applications to read their configuration from the environment. For development, we allow an alternate way to provide configuration by putting it in a .env
file in the root of the project and read it with something like dotenv. This file is ignored by git so that sensitive values can be placed in there and also because configuration can vary from environment to environment. However, if we don’t provide a standard development configuration, we have a few problems:
- At best, a new developer must manually set up their configuration. At worst, they can’t do it without help or cloning someone else’s configuration.
- Each developer’s environment tends to look a little different, leading to “works for me” issues.
- Adding or changing configuration keys requires each dev to update their config.
To solve these problems, we provide a standard development configuration file and allow you to override it if needed. We do this by creating a file called .env.example
that contains sane defaults for development. .env.example
is committed to git so we don’t put sensitive values in it. We will set up our backend services locally so that no sensitive values should be needed, but if you end up needing some, you’ll have to leave them blank and make each developer fill them in. The setup script should symlink .env
to this file so that everything just works. But it leaves the option open to delete the symlink and provide custom, possibly sensitive, config values.
Local Services
I have a strong preference for all backend services (databases, etc.) to be running locally. In fact, it should be local not only to the computer, but even to the project. I don’t want to share a database with some other project. This allows my scripts to freely destroy and recreate these services as needed. Docker is a great solution for this.
To illustrate the importance of this, let’s consider the ramifications of sharing a database with other projects. First of all, this means you can’t create it. You can create the schema inside the database, but you can’t install the service itself. That means the user must install it themselves. In order to create the schema, we also must have a user set up with the right permissions, which the developer must also do manually. If those permissions need to be changed at a later date, this must be communicated to everyone running the environment and they must all update the permissions manually. Bummer. You also won’t get to control the version of the database being used. And if you want to change the version, again, you must tell all users to do so manually.
We want all aspects of the environment to be completely within our control so that you can simply run the setup script and have a guaranteed, consistent state.
Ephemeral State
Any state should be considered ephemeral. Generally, we are talking about your database on this one. The situation I’m trying to avoid is one where a feature being developed requires the database to be in a certain state that was created manually. To make their environment fully work, a developer must clone state from another developer’s machine or manually create state. We can do better. If there are any common states we need to create in our data store or any other backend service, this should be automatically bootstrapped in our setup script.
Let’s stick with the database example because it is so common. Your setup script should perform four steps for your database.
- Destroy the current database if it exists.
- Create the database from scratch (I like to use Docker).
- Initialize the schema via the same mechanism that you will use in production (something like Knex Migrations, Pyrseas, or perhaps just a plain SQL file).
- Insert seed data for development. For example, if you are making an e-commerce website, you’ll probably want to put some products in there at minimum.
Destroy Regularly
You probably noticed that step one of the database setup was to destroy it. I highly value destroying and recreating your entire environment daily. It should be so easy and fast that nobody avoids doing it.
On the last two teams I’ve worked on, it has become a standard to do this as a first resort when something doesn’t work. There are so many reasons that a dev environment might be malfunctioning when nothing is actually wrong. Perhaps you pulled the latest code but forgot to install dependencies. Or maybe you have multiple repositories and you forgot to pull one of them. Maybe there was a schema change that you haven’t applied yet. Maybe a new backend service was added that you haven’t initialized. Before wasting time hunting down what is wrong, we always just run the script to recreate the entire environment. On both teams this script takes no more than a minute. If something is still broken, then you know there is a real problem, not just some environment issue.
We get two other advantages from destroying the environments regularly. First, we know that it works. Just like anything else, the longer it has been since you last ran it, the lower your confidence is that it will work. Since ours is run several times a day, when something goes wrong with it (which is rare) you know that the issue was introduced that very day. And since it is all tracked in git, you can simply check the recent commits for what broke it.
Second, this helps us to keep people out of the habit of manually creating state or anything else to do with their environment. You will naturally update the seed file for populating your database instead of running a manual query because you know that otherwise you’ll lose it all by tomorrow.
Mock External Services
I’m defining external services as any service you consume that is not completely in your control. This could be a third-party service or even a service provided by another team within your company.
There are really three options here:
- Run the service locally
- Consume a remote/shared service
- Mock the service
Option 1 is terribly problematic in my experience because it is heavy, slow and unreliable. Not every team will make their service easy to run and update. You will end up spending time daily just keeping these all updated and running. You’ll be a second-class citizen as well. When the team makes breaking changes to their project, telling you will be an afterthought at best. On top of that, running all these services will mean running lots of processes, backend services, installing lots of dependencies, etc. It will make setup and rebuild much slower. That’s not a great development experience.
Option 2 solves most of the problems from option 1. It is fast because there is no setup. It is updated automatically by the team that owns it. The downsides are that it requires network connectivity and that it can be down. Neither of these are on my stated design goals and therefore I consider option 2 reasonable.
Option 3 is my favorite. I tend to only consume a few endpoints from each API I use, so mocking them is trivial. They are lightweight and reliable. The obvious downside is that your mock could be out of sync with the real service. Keep this in mind and write tests that will catch such changes in your staging environment or similar.
A (Nearly) Full Example
At Pluralsight, I work on a project called Guides. You can check it out at http://www.pluralsight.com/guides. To make this all a bit more concrete, I’d like to share some of the actual code we use to manage our development environment.
README & Startup Script
The first thing you would find in our repository is a README that points you to a script called everything
, which is a bash script that looks like this:
./check-dependencies \
&& ./update-sources \
&& ./recreate-infrastructure \
&& ./update-database-schema \
&& ./insert-seed-data \
&& ./restart-apps
The real work is broken up in to several other scripts that we call. This way you can run individual scripts when it makes sense. For example, if you update the database schema, you can apply that quickly by running only ./update-database-schema
.
Dependencies
check-dependencies
looks for various dependencies and resolves them the best it can. In some cases it can resolve them automatically such as this bit that symlinks your .env
file to our default dev configuration which we call .env.example
.
function check_failure {
code=$?
if [[ "$code" != "0" ]]
then
echo "Command failed. Exiting."
exit $code
fi
}
if [ ! -f "../api/.env" ]
then
echo "No .env file found in api. Symlinking to .env.example"
ln -s ../api/.env.example ../api/.env
check_failure
fi
In some cases, we didn’t want to fully automate. In these cases, we check if the dependency is present and, if it is not, we tell you what we think you need to do. Here we check if XCode is installed and if not, we suggest how you might install it.
xcodebuild -version > /dev/null 2>&1
if [[ "$?" != "0" ]]
then
echo "Missing dependency: XCode"
echo "Suggested installation procedure: Install XCode via the App Store"
echo "Once XCode is installed, go to Preferences > Settings > open the Command Line Tools drop down and select version"
exit 1
fi
In update-sources
we pull the latest code from git and install dependencies with yarn. The Guides repository is a monorepo so we have to yarn install in several directories.
git pull
(cd ../shared && yarn)
(cd ../api && yarn)
(cd ../ui && yarn)
(cd ../content-tools-ui && yarn)
Executing Our Code
Since Guides is made up of several services, we like to have them all running at all times and auto-restart whenever we make changes. We use PM2 for this. We have a single PM2 configuration file (called an ecosystem file) that describes how all the services should be run so all we have to do is tell PM2 where to find this config file and run start. It looks like this:
{
"apps": [
{
"name": "api",
"watch": true,
"ignore_watch": ["logs"],
"cwd": "api",
"interpreter": "node",
"node_args": "--nolazy -r ts-node/register --inspect=12345",
"script": "src/server.ts"
},
{
"name": "ui",
"watch": false,
"cwd": "ui",
"script": "server.js",
"args": "-d"
},
{
"name": "toolsui",
"watch": false,
"cwd": "content-tools-ui",
"script": "server.js",
"args": "-d"
},
{
"name": "listeners",
"watch": true,
"cwd": "listeners",
"script": "start-all-listeners.js"
},
{
"name": "mocks",
"watch": true,
"cwd": "mocks",
"script": "server.js"
}
]
}
Mock Services
The last service you see in there is called mocks
. To keep things running fast, locally and reliably, we want to be in full control of the entire environment. For any service we depend on that is provided by another team or company, we create a mock version of that service and we put those in our mocks service.
What if we mock it wrong? Aren’t we creating “works for me” issues? To some degree, yes. Here’s our workflow. If something in a third-party script changes or breaks (or we just got it wrong in our mock) then we find out when automated tests are run in our stage environment which uses real services, not mocks. Once we discover the issue, if needed, we will temporarily point a dev environment at a real service instead of a mock to help diagnose. Once we figure it out, we adjust the mock if necessary.
Backend Services (Databases, etc.)
We use Docker Compose to bring all this stuff up. Here’s a portion of that file:
services:
db:
image: Postgres:10-alpine
volumes:
- /var/lib/postgresql/data
expose:
- "5432"
ports:
- "5432:5432"
environment:
POSTGRES_USER: root
POSTGRES_PASSWORD: password
POSTGRES_DB: guides
rabbit:
image: rabbitmq:3-management
ports:
- "15672:15672"
- "5672:5672"
All you need to bring up docker-compose services is the command docker-compose up -d
. However, we created a wrapper script called recreate-infrastructure
that does a few additional things for us.
./dc down \
&& ./dc up --remove-orphans -d \
&& ./wait-for-Postgres \
&& echo "CREATE ROLE guides WITH LOGIN PASSWORD 'password'; ALTER SCHEMA public OWNER TO guides;" | ./dc exec -T db psql guides
Remember how I said destroying is important? Here we do that first with dc down
. We also bake in the —remove-orphans
flag here so that our machines don’t get bloated.
wait-for-postgres
represents an important class of issues when working with docker. When you start a docker container with docker run …
or docker-compose up -d
or similar, the script will finish when the containers have been created, but that does not mean that the services running inside the containers are ready. In our example here, we want to create a role called guides
in our database. If we ran the create role command right after dc up
, it would fail. So we introduced a script that blocks until Postgres is ready. We follow a similar pattern for some other services as well.
Here’s what our wait-for-postgres
script looks like:
echo "Waiting for Postgres to be ready"
./dc exec -T db pg_isready
while [[ "$?" != "0" ]]
do
sleep 1
./dc exec -T db pg_isready
done
Before sleeping at all, it first checks if Postgres is ready by executing pg_isready
inside the Postgres container. pg_isready
is a utility that ships with Postgres. This way there is minimal delay if Postgres is already up. If it is not ready, we loop until it is ready, sleeping each time.
Inserting Seed Data
We have a directory of SQL files that create our seed data. A simple bash script executes them.
for f in `ls dev-seed-data/*.sql`; do
cat $f | ./dc exec -T -e PGPASSWORD=password db psql -U guides guides
done
Some frameworks and ORMs provide fancier ways to do this. They are sometimes called fixtures.
Summary
I have shared what I thought were the most important bits to help you recreate a development environment similar to mine and to understand the values that motivated this setup. I couldn’t share everything about my environment and surely your environment will have its own unique challenges. Therefore, the most important takeaway I have for you is to make it a daily habit to destroy and re-create your development environments. This will force you to fully automate everything and know that it always works.