Control Your Database Schema

How do you start a project? Most likely create an empty repository for the source code. Why is version control then only an afterthought for our database schema? Particular lines of code only function properly with a particular state of schema, so it makes sense to have them side-by-side. You should be able to clone any revision and expect it to work without hassle.

In this post, I’ll use a simple web application to show how schema control fits in. I will also try to anticipate common issues and misconceptions. Relevant code is available on Github.

Right Tool for the Job

Many tools have emerged over the years, but I find that Liquibase comes out on the top. It has matured over time and kept true to a strong philosophy. Various integration options make getting started easy and it’s not hard to roll your own when the need arises.

Liquibase combines XML-declared schema alterations (changesets) into a recipe (changelog) that describes how your database should look like. On application startup it then goes over the changelog and runs necessary changesets. How to determine that a particular change has to be applied? Each has an id and an author that are kept in the database after a successful run. Why the combination of two? It might seem that name clashes are a non-issue, but I dare you to track them in a team or through multiple repository branches.

XML might seem archaic, but is a necessary evil to achieve vendor-independence. Swap the connection and you’ll have a fully functional in-memory database for tests. Also, any modern IDE with autocomplete makes it a bliss to write. If you’re not convinced, take a look at the SQL based variation at formatted SQL documentation.

Setup

Our example application is a REST service that persists an event per request and produces the same event as a response:

@Path("/")
public class WelcomeService {

  @GET
  @Produces("text/plain")
  public String welcome() {
    Event e = new Event();
    Ebean.save(e);
    return "Event{" + "id=" + e.id + '}';
  }

}

Event is represented as a small JPA entity:

@Entity
public class Event {

  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  public Long id;

  @Override
  public String toString() {
    return "Event{" + "id=" + id + '}';
  }

}

Now that we have the application code figured out, let’s describe what our database looks like:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
         http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-2.0.xsd">

  <changeSet id="added event table" author="mirko">
    <createTable tableName="Event">
      <column name="id" type="bigint" autoIncrement="true">
        <constraints primaryKey="true" nullable="false"/>
      </column>
    </createTable>
  </changeSet>

</databaseChangeLog>

This little snippet demonstrates a changelog with one changeset that adds a new table Event with a primary key.

Clone the repository and run the project to see it in action:

git clone https://github.com/mirkoadari/liquibase-example.git
cd liquibase-example
mvn jetty:run

And if we open our browser at http://localhost:8080, the application responds Event{id=1}.

Make Some Changes

So how would we go about extending the functionality? It might make sense to know when the event took place. Our code would look like this:

@Path("/")
public class WelcomeService {

  @GET
  @Produces("text/plain")
  public String welcome() {
    Event e = new Event();
    Ebean.save(e);
    Ebean.refresh(e); // refresh to get the generated createdAt
    return "Event{" + "id=" + e.id + ", createdAt=" + e.createdAt + '}';
  }

}
@Entity
public class Event {

  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  public Long id;

  @Column(insertable = false, updatable = false)
  @Temporal(TemporalType.TIMESTAMP)
  public Date createdAt;

  @Override
  public String toString() {
    return "Event{" + "id=" + id + ", createdAt=" + createdAt + '}';
  }

}

Schema also needs a revision, so we will define a changeset that adds a timestamp column with default value CURRENT_TIMESTAMP to the table:

<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
         http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-2.0.xsd">

  <changeSet id="added event table" author="mirko">
    <createTable tableName="Event">
      <column name="id" type="bigint" autoIncrement="true">
        <constraints primaryKey="true" nullable="false"/>
      </column>
    </createTable>
  </changeSet>
  <changeSet id="added created_at timestamp to event" author="mirko">
    <addColumn tableName="Event">
      <column name="created_at" type="timestamp" defaultValueComputed="CURRENT_TIMESTAMP">
        <constraints nullable="false"/>
      </column>
    </addColumn>
  </changeSet>

</databaseChangeLog>

If we now merge the changes back to the main branch and run it again, the response is Event{id=2, createdAt=...}.

git merge event-created-at
mvn jetty:run

Automated Updates

So how did Liquibase know to update the schema? I provided my own implementation based on LiquibaseServletListener that takes a database connection and a changelog, and works through the changes.

public class DatabaseServletListener implements ServletContextListener {
  ...
  public void contextInitialized(ServletContextEvent sce) {
    String changeLog = sce.getServletContext().getInitParameter("liquibase.changelog");    
    ...
    JdbcDataSource dataSource = new JdbcDataSource();
    Connection connection = null;
      try {
        connection = dataSource.getConnection();
        ...
        Database database = DatabaseFactory
				.getInstance()
				.findCorrectDatabaseImplementation(new JdbcConnection(connection));
        Liquibase liquibase = new Liquibase(changeLog, ..., database);

        liquibase.update(null);
      }
      finally {
        if (connection != null)
          connection.close();
      }
     ...
  }
  ...
}

Even this small example demonstrates how easy customization is. But if you don’t feel so adventurous, many official and third-party integrations are already available. You could add updates to a continuous integration build, deploy scripts or maven project. Imagination really is the limit here. See the official documentation to learn more.

Context Matters

Sometimes you need certain updates to run in a specific environment. Liquibase provides contexts, that enable us to group changesets and run them only when we need to.

As an example, consider a test environment. QA team probably needs some data to play with:

<changeset id="1" author="anonymous" context="test">
  <loadData tableName="User" file="com/scalabilitycookbook/test/users.csv">
    <column name="id" type="NUMERIC"/>
    <column name="firstname" type="STRING"/>
    <column name="lastname" type="STRING"/>
    <column name="username" type="STRING"/>
  </loadData>
</changeset>
liquibase update --contexts=test

Note that changesets with no context are always run and you can specify more than one context.

Keep It Manageable

As your schema evolves, it becomes difficult to read and reason about. Common practice is to structure the schema into per-version changelogs and combine them with <include/>.

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog 
                        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-2.0.xsd
                        http://www.liquibase.org/xml/ns/dbchangelog-ext 
                        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd">
  <include file="schema/1.0.xml"/>
  <include file="schema/1.1.xml"/>
  <include file="schema/2.0.xml"/>
</databaseChangeLog>

Compact changesets and meaningful ids also contribute to readability. So do your fellow developers a favor and don’t put ten tables into one and name it “uber-version”.

Schema Out of the Code

Unnecessary constraints, i.e.@Nullable, should be kept outside of the domain model. Duplication confuses the reader and introduces inconsistencies that lead to errors in the long run. What should be done instead is to turn on any schema-model verification your framework offers, i.e. Hibernate has the option hibernate.hbm2ddl.auto=validate and Play Framework jpa.ddl=validate. It’s easy to enable and catches problems early.

Legacy Pains

Schema version control sounds good and you’ll probably use it for your next project. What about the application you’re working on right now? Liquibase makes it easy to migrate by automatically generating a changelog from an existing database.

liquibase generateChangelog > schema/1.0.xml

We also add preconditions to generated changesets for compatibility with existing deployments.

<preConditions onFail="MARK_RAN">
  <not>
    <tableExists tableName="User"/>
  </not>
</preConditions>

Not a Backup Tool

With changes universally declared, Liquibase can figure out reverse statements and roll back to an earlier state.

liquibase rollbackCount 2

While this is often useful in development, you should never try it in production! Any database modification can leave your application in an inconsistent state at the best times and lose data at worst. How to safely update and roll back production databases is an interesting topic and deserves a follow-up post in the future.

When Declarative Does Not Cut It

There are cases when declarative nature might get in the way. Data conversion is one such example. Turns out, Liquibase allows you write custom SQL queries for updates and rollbacks. And you get to keep the automation.

<changeSet id="1" author="anonymous">
  <sql>insert into person (id, name) values (1, 'John')</sql>
  <rollback>
    <sql>delete from person where id = 1</sql>
  </rollback>
</changeSet>

Conclusion

Database schema is a crucial part of your application and as such belongs right next to the code. Available tools make it easy to integrate into any project. I hope this post did help you to get started and provides answers to any questions that might arise.

Advertisements

4 comments

  1. Really informative

  2. Good introduction, going to check it out!

  3. […] familiar enough with Liquibase to write the changelogs? See this blog post by ZeroTurnaround’s very own Mirko Adari, for more info and an usage example. If you prefer not […]

  4. […] can map your existing database with just a single command. Learn more about Liquibase from an overview I wrote and about LiveRebel enhancements […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s