Making yaml_db work with schema_plus foreign key constraints on PostgreSQL

Contrary to the received wisdom, I’ve decided I want referential integrity in my Rails database.  I ended up selecting schema_plus as the tool I’m using to facilitate implementing RI. This, of course, causes all sorts of potential headaches.  I’ve solved most of them by poking around on the web, but this latest one took some solving on my part.

Making yaml_db work with referential integrity ended up requiring monkey-patching both Rails itself and yaml_db, and it also involved some configuration changes for the foreign keys I configured using schema_plus. Because it touches all three, I’m writing this up in one place where I can reference it as I open issues against all three packages. In addition, this provides real relief for anyone who runs into these issues in the future. Also note that this code is not necessarily safe for non-Postgres databases.

This code was developed and tested using active_record 3.2.13, schema_plus 1.1.2, and yaml_db 0.2.3.

Simple implementation:

  • Change all your schema_plus foreign keys to use deferrable: true using a migration (hints on this later)
  • Add the following monkey-patches somewhere (I use config/initializers/application.rb):
# Patch to enable database_cleaner and schema_plus (i.e. foreign keys) to coexist
# Supposedly fixed in Rails 4.  I ended up modifying this as well and will open 
# See https://github.com/matthuhiggins/foreigner/issues/61
if Rails::VERSION::MAJOR < 4
  #Fix fixtures with foreign keys, fixed in Rails4
  class ActiveRecord::ConnectionAdapters::PostgreSQLAdapter
    def disable_referential_integrity #:nodoc:
      if supports_disable_referential_integrity? then
        execute(tables.collect { |name| "ALTER TABLE #{quote_table_name(name)} DISABLE TRIGGER USER" }.join(";"))
        transaction do
          execute("SET CONSTRAINTS ALL DEFERRED")
          yield
        end
      else
        yield
      end
    ensure
      if supports_disable_referential_integrity? then
        execute(tables.collect { |name| "ALTER TABLE #{quote_table_name(name)} ENABLE TRIGGER USER" }.join(";"))
      end
    end
  end
end

# Patch YamlDb to work with schema_plus
module SerializationHelper
  class Base
    def load(filename, truncate = true)
      disable_logger
      ActiveRecord::Base.connection.disable_referential_integrity do
        @loader.load(File.new(filename, "r"), truncate)
      end
      reenable_logger
    end
  end

  class Load
    def self.truncate_table(table)
      begin
        ActiveRecord::Base.connection.execute("SAVEPOINT before_truncation")
        ActiveRecord::Base.connection.execute("TRUNCATE #{SerializationHelper::Utils.quote_table(table)} CASCADE")
      rescue Exception
        ActiveRecord::Base.connection.execute("ROLLBACK TO SAVEPOINT before_truncation")
        ActiveRecord::Base.connection.execute("DELETE FROM #{SerializationHelper::Utils.quote_table(table)}")
      end
    end

    def self.load_records(table, column_names, records)
      if column_names.nil?
        return
      end
      columns = column_names.map{|cn| ActiveRecord::Base.connection.columns(table).detect{|c| c.name == cn}}
      quoted_column_names = column_names.map { |column| ActiveRecord::Base.connection.quote_column_name(column) }.join(',')
      quoted_table_name = SerializationHelper::Utils.quote_table(table)
      (records || []).each do |record|
        quoted_values = record.zip(columns).map{|c| ActiveRecord::Base.connection.quote(c.first, c.last)}.join(',')
        ActiveRecord::Base.connection.execute("INSERT INTO #{quoted_table_name} (#{quoted_column_names}) VALUES (#{quoted_values})")
      end
    end
  end

  class Dump
    def self.tables
      ActiveRecord::Base.connection.tables.reject { |table| ['schema_info', 'schema_migrations'].include?(table) }.sort
    end

    def self.dump_table(io, table)
      dump_table_columns(io, table)
      dump_table_records(io, table)
    end
  end
end

Explanation:

Temporarily disabling foreign keys in Postgres can be done two ways.  One is via ALTER TABLE table_name DISABLE TRIGGER ALL.  This, however, requires that the user executing the command have superuser privileges for Postgres, and as many have pointed out this is not necessarily desirable.

The other approach is to use deferrable foreign keys and then to use SET CONSTRAINTS ALL DEFERRED from within a transaction.  This latter approach does not require superuser privileges, and for that reason I believe it is preferable.  Also, I’m not sure if the referential integrity is verified when the triggers are re-enabled in the first approach, where as deferring the constraints does still validate them when the transaction commits.

With the above understanding in mind, I set out to figure out how to make all the pieces of the puzzle work with deferred foreign keys.

First I had to set all my foreign keys to deferrable.  This is not the default in schema_plus.  I ended up with code like this:

class MakeForeignKeyConstraintsDeferrable < ActiveRecord::Migration
  def up
    change_table :abilities do |t|
      t.change :parent_ability_id,  :integer, foreign_key: nil
    end

    change_table :abilities do |t|
      t.change :parent_ability_id,  :integer, foreign_key: { references: :abilities, deferrable: true }
    end
  end

  def down
    change_table :abilities do |t|
      t.change :parent_ability_id,  :integer, foreign_key: nil
    end

    change_table :abilities do |t|
      t.change :parent_ability_id,  :integer, foreign_key: { references: :abilities }
    end
  end
end

Note how I drop the foreign key using foreign_key: nil in one change_table before I reinstate it in the next – I’m not sure if this is 100% required, but I wanted to be absolutely certain. I used a copy of my schema.rb file to do before and after comparisons and to ensure that backing out the migration resulted in the same schema.rb file. To generate this file (with all 15 foreign keys) I copied the appropriate lines from schema.rb and then did some clever parallel editing in Sublime Text.

Next I had to make Rails work with SET CONSTRAINTS ALL DEFERRED. I had already run into some changes made in Rails 4 to make foreigner (another foreign key gem) work with fixture loading (I was actually trying to fix and issue with database_cleaner, which I use to do cleanup for JavaScript acceptance tests). The comment on the issue I looked at was https://github.com/matthuhiggins/foreigner/issues/61#issuecomment-16880422. So I already had that snippet in my monkey-patch file, and I realized that I could make changes there to get things working.

I needed to ensure that if referential integrity was being disabled that it (1) disabled the user triggers first (this has no effect on referential integrity, but we’re assuming that if you want referential integrity disabled you probably want user-defined triggers disabled as well), then (2) started a transaction, and then (3) deferred the constraints within the transaction before (4) yielding to the block. The monkey patch above does that!

Having done that, I needed to modify yaml_db to work with the referential integrity disabled! First I modified SerializationHelper::Base#load so that when it goes to do the loading it does that within the context of disabled_referential_integrity. Unfortunately, that didn’t prove sufficient. The problem is that TRUNCATE isn’t legal on a table that has delayed triggers pending! Also, TRUNCATE needed the CASCADE option. So I added CASCADE, but I was still having issues. It turns out that once you’re in a Postgres transaction and one statement bombs out, you can’t keep issuing additional statements until you rollback to somewhere. The answer to this question helped a lot! So I added the SAVEPOINT and ROLLBACK commands into the appropriate places. That way the code will try TRUNCATE first and then fall back to DELETE successfully. A better approach would probably be to TRUNCATE all of the tables first using CASCADE and then to start loading them (since TRUNCATE is faster than DELETE, and if we do all the TRUNCATE operations first there are no pending triggers to cause issues), but I didn’t want to completely rearchitect yaml_db.

At this point, I had a successful solution. But I wanted to do some testing, and that lead me into another few issues. It turns out that yaml_db doesn’t always dump the tables in the same order, so comparing the dump.yml files isn’t trivial. So I tweaked SerializationHelper::Dump.tables so that it would sort the list of tables. Then I discovered that yaml_db doesn’t dump empty tables, which in turn means it doesn’t empty those tables when loading! So I tweaked SerializationHelper::Dump.dump_table to remove the conditional that ignores empty tables. Then I ran into an issue where the load process crashed on the empty tables, so I modified SerializationHelper::Load.load_records to substitute an empty array if records is nil. Now I could successfully compare two dumps and I could be certain that records added to previously empty tables were getting wiped out!