Notes on database migration
This post discusses the topic of database migration. It touches on different aspects, names a few ready-made solutions (Java, node.js, language-agnostic), links to useful resources and outlines the thought process why I've chosen to roll my own DB migration facility for one of my current web application projects.
What you gonna read now is an excerpt from the project's README.md. In case you are eager to know because it's relevant to you, the app is going to be a SaaS hosted on Heroku built on top of node.js and PostgreSQL.
Without saying much more, here are my notes on database migration.
(If you wonder why the excerpt is discussing the topic in this broader manner, part of my assignment is to train a less-experienced developer.)
Also called database change management or database upgrading (terms will be used interchangeably in the following), database migration is the process of managing the change of a database over the course of an application's lifecycle. What could change in a database? The database structure (i.e. the tables), master data but even indices, triggers and stored procedures could be added, changed or deleted over time.
A proper database change management facility allows to apply those changes as cleanly as possible. Or to put it in other words, it should prevent developers from messing up the database. There are a lot of online resources about what a database migration facility should look like, but here are some key requirements:
- can be run by one command
- figures out for itself which database upgrades are to be applied
- upgrade scripts should reside in the version control system (e.g. git) so that it's clear on which database version the application code revisions relies on
- upgrade scripts might be idempotent so that even if they are run multiple times, the changes only take effect once
- should indicate a clear order / dependency of upgrade scripts
- should leave a database in a consistent state if an upgrade script fails (PostgreSQL supports transactional DDL, not every DBMS does that)
Given those requirements, a lot of ready-made solutions exist either baked into a framework or standalone. Ruby on Rails has so-called Active Record Migrations. Liquibase and Flyway are popular solutions in the Java world. Counterparts in the node.js community are umzug and db-migrate. A language agnostic tool that caught my attention is Sqitch. H/T to Change management tools and techniques in the PostgreSQL wiki.
Having a good tool at your disposal is important, but it's not the only thing that matters. You will want to be conservative when transitioning a production database from one version to the other. I found Database migrations done right to be a good article to induce the right mindset. The author mentions the book Refactoring Databases if you want to go deeper.
Our DB migration solution with node.js, PostgreSQL and Heroku
Now, here comes our own situation. We're using node.js on the backend and host the whole stuff on Heroku. We store our data in PostgreSQL. As said before, PostgreSQL is supporting transactional DDL. AFAIK this is not to be taken for granted.
Despite the many solutions available in the node.js ecosystem and the ones that are language agnostic, I decided to roll our own little database migration facility. I've used home-grown tools in the past and they've worked quite well. Solutions like Sqitch are interesting, but Squitch specifically introduces a lot of concepts that we might not need ever. For example verification, where you create a companion script that solely verifies the result of the corresponding upgrade script (e.g. by querying stuff from the information schema). Another example is dependencies between upgrades, which means the order of execution is determined by a dependency declaration rather than by a simple numbering system. Either way, Sqitch seems to be an interesting solution for larger applications in which all those nice features become relevant. I also considered umzug, because it was recommended in Database Migrations with Nodejs written by Kostas Bariotis.
But the thing that gets me shivers is when I'm about to introduce a new dependency to the system and take a look at the open Issues tab on Github beforehand. Even though the majority of issues might be RTFM, it urges me to ask the question "Can I trust this library to not mess with our database?". Answering this question would take me quite some time and a potential outcome could be that I can't trust it.
To run the migration procedure use
npm run db-upgrade.
db-upgrade is defined as a script in
package.json. Upgrade scripts reside in
./db/upgrades/upgrade.js is the script implementing the upgrade mechanics. When there's anything failing in the database during a run, the program rolls back all changes and halts with exit code 1. This is important because it is also executed in our Heroku release process. To be precise, it runs in the release phase. See the
Procfile in the root directory for how simply it is to run something in the release phase. When Heroku sees that a command in the release phase exits with a non-zero exit code, it'll abort the deployment of the new release. In this case Heroku is happily running the old version of the application until you fix the problem. Speaking of fixing the problem, there might be non-code issues that prevented a success. In this case you can retry a release with the heroku-releases-retry CLI plugin.
A note on the naming of the
./db/upgrades/up directory. If the app gets wildly successful, we might need some extra features for our DB change management, namely reverting (Ruby on Rails has that baked in) and verifying an upgrade. I'd call those
./db/upgrades/verify and I borrowed the naming from Sqitch.
To add a new database change, add a new SQL script to
./db/upgrades/up conforming to the naming convention
/[a-z0-9-_]/i (this is a regular expression) which translates to alphanumeric characters, a dash '-' and underscore '_'. The version number should be increased by one of course and what you end up with is something like
0012_create-foo-table.sql. When running
db-upgrade successfully, it'll log this change in a special
db_upgrade table that is created by our tool itself. This way,
db-upgrade will be able to check if a certain upgrade has been applied already in the past. However, writing idempotent scripts is still a good measure. It's an additional safety net because our migration tool can have bugs as well.