Making yaml_db work with schema_plus foreign key constraints on PostgreSQL

Contrary to the received wisdom, I’ve decided I want referential integrity in my Rails database.  I ended up selecting schema_plus as the tool I’m using to facilitate implementing RI. This, of course, causes all sorts of potential headaches.  I’ve solved most of them by poking around on the web, but this latest one took some solving on my part.

Making yaml_db work with referential integrity ended up requiring monkey-patching both Rails itself and yaml_db, and it also involved some configuration changes for the foreign keys I configured using schema_plus. Because it touches all three, I’m writing this up in one place where I can reference it as I open issues against all three packages. In addition, this provides real relief for anyone who runs into these issues in the future. Also note that this code is not necessarily safe for non-Postgres databases.

This code was developed and tested using active_record 3.2.13, schema_plus 1.1.2, and yaml_db 0.2.3.

Simple implementation:

  • Change all your schema_plus foreign keys to use deferrable: true using a migration (hints on this later)
  • Add the following monkey-patches somewhere (I use config/initializers/application.rb):
# Patch to enable database_cleaner and schema_plus (i.e. foreign keys) to coexist
# Supposedly fixed in Rails 4.  I ended up modifying this as well and will open 
# See https://github.com/matthuhiggins/foreigner/issues/61
if Rails::VERSION::MAJOR < 4
  #Fix fixtures with foreign keys, fixed in Rails4
  class ActiveRecord::ConnectionAdapters::PostgreSQLAdapter
    def disable_referential_integrity #:nodoc:
      if supports_disable_referential_integrity? then
        execute(tables.collect { |name| "ALTER TABLE #{quote_table_name(name)} DISABLE TRIGGER USER" }.join(";"))
        transaction do
          execute("SET CONSTRAINTS ALL DEFERRED")
          yield
        end
      else
        yield
      end
    ensure
      if supports_disable_referential_integrity? then
        execute(tables.collect { |name| "ALTER TABLE #{quote_table_name(name)} ENABLE TRIGGER USER" }.join(";"))
      end
    end
  end
end

# Patch YamlDb to work with schema_plus
module SerializationHelper
  class Base
    def load(filename, truncate = true)
      disable_logger
      ActiveRecord::Base.connection.disable_referential_integrity do
        @loader.load(File.new(filename, "r"), truncate)
      end
      reenable_logger
    end
  end

  class Load
    def self.truncate_table(table)
      begin
        ActiveRecord::Base.connection.execute("SAVEPOINT before_truncation")
        ActiveRecord::Base.connection.execute("TRUNCATE #{SerializationHelper::Utils.quote_table(table)} CASCADE")
      rescue Exception
        ActiveRecord::Base.connection.execute("ROLLBACK TO SAVEPOINT before_truncation")
        ActiveRecord::Base.connection.execute("DELETE FROM #{SerializationHelper::Utils.quote_table(table)}")
      end
    end

    def self.load_records(table, column_names, records)
      if column_names.nil?
        return
      end
      columns = column_names.map{|cn| ActiveRecord::Base.connection.columns(table).detect{|c| c.name == cn}}
      quoted_column_names = column_names.map { |column| ActiveRecord::Base.connection.quote_column_name(column) }.join(',')
      quoted_table_name = SerializationHelper::Utils.quote_table(table)
      (records || []).each do |record|
        quoted_values = record.zip(columns).map{|c| ActiveRecord::Base.connection.quote(c.first, c.last)}.join(',')
        ActiveRecord::Base.connection.execute("INSERT INTO #{quoted_table_name} (#{quoted_column_names}) VALUES (#{quoted_values})")
      end
    end
  end

  class Dump
    def self.tables
      ActiveRecord::Base.connection.tables.reject { |table| ['schema_info', 'schema_migrations'].include?(table) }.sort
    end

    def self.dump_table(io, table)
      dump_table_columns(io, table)
      dump_table_records(io, table)
    end
  end
end

Explanation:

Temporarily disabling foreign keys in Postgres can be done two ways.  One is via ALTER TABLE table_name DISABLE TRIGGER ALL.  This, however, requires that the user executing the command have superuser privileges for Postgres, and as many have pointed out this is not necessarily desirable.

The other approach is to use deferrable foreign keys and then to use SET CONSTRAINTS ALL DEFERRED from within a transaction.  This latter approach does not require superuser privileges, and for that reason I believe it is preferable.  Also, I’m not sure if the referential integrity is verified when the triggers are re-enabled in the first approach, where as deferring the constraints does still validate them when the transaction commits.

With the above understanding in mind, I set out to figure out how to make all the pieces of the puzzle work with deferred foreign keys.

First I had to set all my foreign keys to deferrable.  This is not the default in schema_plus.  I ended up with code like this:

class MakeForeignKeyConstraintsDeferrable < ActiveRecord::Migration
  def up
    change_table :abilities do |t|
      t.change :parent_ability_id,  :integer, foreign_key: nil
    end

    change_table :abilities do |t|
      t.change :parent_ability_id,  :integer, foreign_key: { references: :abilities, deferrable: true }
    end
  end

  def down
    change_table :abilities do |t|
      t.change :parent_ability_id,  :integer, foreign_key: nil
    end

    change_table :abilities do |t|
      t.change :parent_ability_id,  :integer, foreign_key: { references: :abilities }
    end
  end
end

Note how I drop the foreign key using foreign_key: nil in one change_table before I reinstate it in the next – I’m not sure if this is 100% required, but I wanted to be absolutely certain. I used a copy of my schema.rb file to do before and after comparisons and to ensure that backing out the migration resulted in the same schema.rb file. To generate this file (with all 15 foreign keys) I copied the appropriate lines from schema.rb and then did some clever parallel editing in Sublime Text.

Next I had to make Rails work with SET CONSTRAINTS ALL DEFERRED. I had already run into some changes made in Rails 4 to make foreigner (another foreign key gem) work with fixture loading (I was actually trying to fix and issue with database_cleaner, which I use to do cleanup for JavaScript acceptance tests). The comment on the issue I looked at was https://github.com/matthuhiggins/foreigner/issues/61#issuecomment-16880422. So I already had that snippet in my monkey-patch file, and I realized that I could make changes there to get things working.

I needed to ensure that if referential integrity was being disabled that it (1) disabled the user triggers first (this has no effect on referential integrity, but we’re assuming that if you want referential integrity disabled you probably want user-defined triggers disabled as well), then (2) started a transaction, and then (3) deferred the constraints within the transaction before (4) yielding to the block. The monkey patch above does that!

Having done that, I needed to modify yaml_db to work with the referential integrity disabled! First I modified SerializationHelper::Base#load so that when it goes to do the loading it does that within the context of disabled_referential_integrity. Unfortunately, that didn’t prove sufficient. The problem is that TRUNCATE isn’t legal on a table that has delayed triggers pending! Also, TRUNCATE needed the CASCADE option. So I added CASCADE, but I was still having issues. It turns out that once you’re in a Postgres transaction and one statement bombs out, you can’t keep issuing additional statements until you rollback to somewhere. The answer to this question helped a lot! So I added the SAVEPOINT and ROLLBACK commands into the appropriate places. That way the code will try TRUNCATE first and then fall back to DELETE successfully. A better approach would probably be to TRUNCATE all of the tables first using CASCADE and then to start loading them (since TRUNCATE is faster than DELETE, and if we do all the TRUNCATE operations first there are no pending triggers to cause issues), but I didn’t want to completely rearchitect yaml_db.

At this point, I had a successful solution. But I wanted to do some testing, and that lead me into another few issues. It turns out that yaml_db doesn’t always dump the tables in the same order, so comparing the dump.yml files isn’t trivial. So I tweaked SerializationHelper::Dump.tables so that it would sort the list of tables. Then I discovered that yaml_db doesn’t dump empty tables, which in turn means it doesn’t empty those tables when loading! So I tweaked SerializationHelper::Dump.dump_table to remove the conditional that ignores empty tables. Then I ran into an issue where the load process crashed on the empty tables, so I modified SerializationHelper::Load.load_records to substitute an empty array if records is nil. Now I could successfully compare two dumps and I could be certain that records added to previously empty tables were getting wiped out!

RSpec, Mocha, and Shoulda

The latest release of shoulda-matchers, 1.4.2, now requires Mocha.  I’ve been using the RSpec stubbing library, but adding mocha to the project breaks one little part of RSpec stubbing.  While Mocha uses stubs to stub a method and RSpec uses stub, both use unstub if you need to unstub a single method.  This collision causes problems if you’re using RSpec to create the stub, because the call to unstub uses Mocha which then promptly fails to remove the RSpec stub!

The solution is to use the RSpec alias unstub! wherever you need to remove an RSpec stub – since Mocha doesn’t implement unstub!, you can continue to happily use the RSpec stubbing library and still successfully unstub RSpec stubs.

Make declarative_authorization play nicely with fakefs during testing

It appears that the demand-loading of config/authorization_rules.rb has problems when fakefs is turned on. If the first test in a sequence to trigger declarative_authorization checks has fakefs turned on, none of the rules will get loaded. The odds of this are slim, but if you have rspec doing randomization the odds increase if you are rerunning a single spec file.

The workaround is simple. Just force declarative_authorization to load the rules before any tests run. Just add the following to your spec/spec_helper.rb file:

# This ensures authorization rules are loaded prior to tests that use FakeFS
Authorization::Engine.instance

Monkey patching I18n.t in Rails

During the Ruby Hangout on Nov 7th, I suggested that it would be cool if instead of writing t 'path.to.my.string', one could instead write t.path.to.my.string. Josh Szmajda made the mistake of agreeing, and so I decided this would be a good challenge for me to learn some more Ruby tricks.

First of all, I poked into I18n and made a good discovery – the translate method is not particularly useful without passing parameters. This meant that I could use the case where there’s an empty list of parameters to return a Funky Object(TM) that would do something cool with method_missing. Also, I decided that since :t is simply an alias for :translate, I could choose to write my own t that would defer to translate when passed args and do my own thing when it wasn’t. Basically, I could write something like:

require 'i18n'

module I18n
  class << self
    def t(*args)
      if (args.empty?)
        # Do something cool here!
      else
        translate(*args)
      end
    end
  end
end

I did some quick testing and discovered that did indeed work. Now on to the tricky part. Here’s the first version:

require 'i18n'

module I18n
  class << self
    def t(*args)
      if (args.empty?)
        translator = Object.new
        translator.instance_variable_set(:@path, '')
        def translator.method_missing(method_id, *args)
          raise StandardError, "No args please!" unless args.empty?
          @path = "#{@path}.#{method_id.to_s}"
          answer = I18n.translate(@path)
          answer.respond_to?(:keys) ? self : answer
        end
        translator
      else
        translate(*args)
      end
    end
  end
end

This creates a new Object object, then sets the instance variable @path on it to an empty string, and then sets up method_missing on that object to handle arbitrary methods. It looks them up using I18n.translate in whatever path is currently active. If the returned thing behaves like a Hash, then we presume we’re going to need to do this again, and so we just return self so the next method call can follow along. On the other hand, if we gotten to something that doesn’t look like a Hash, then we just return that and we’re done.

The only problem with this approach is that there are a whole bunch of methods defined in Object that can’t be used as keys. For instance, if you have the key inspect in your locale file, you’re out of luck getting to it with this notation.

So I decided to try to swap in BasicObject for Object. This took me a little longer to figure out since I still needed a way to call instance_variable_set. I finally came up with the following. It may not be the best way to handle this, but it’s what I stumbled upon.

module I18n
  class << self
    def t(*args)
      if (args.empty?)
        translator = BasicObject.new
        def translator.__instance_variable_set(sym, obj)
          ::Object.instance_variable_set(sym, obj)
        end
        translator.__instance_variable_set(:@path, '')
        def translator.method_missing(method_id, *args)
          raise StandardError, "No args please!" unless args.empty?
          @path = "#{@path}.#{method_id.to_s}"
          answer = I18n.translate(@path)
          answer.respond_to?(:keys) ? self : answer
        end
        translator
      else
        translate(*args)
      end
    end
  end
end

I add another singleton that can call ::Object.instance_variable_set for me. I have no idea what this syntax is actually doing – I stumbled upon it by looking at the documentation on BasicObject in the Pickaxe Book. But it works! When I use I18n.t.hash_name in irb I get translation missing: en.hash_name.inspect – note that .inspect on the end that results because it tries to look up the inspect message that irb sends the result!

I’m not at all sure whether this whole exercise was a good idea from a production code standpoint, but it was fun!

Using factory_girl with declarative_authorization

I recently added declarative_authorization to a Rails app that uses FactoryGirl with RSpec. As Mark Needham pointed out around two years ago at http://www.markhneedham.com/blog/2010/09/12/ruby-factorygirl-declarative_authorization-random-thoughts/, this can cause problems because FactoryGirl can’t save records without the appropriate authorization. Mark’s suggestion was to use monkey patching to wrap around the create method, but I had some issues attempting to implement this. I suspect the internals of FactoryGirl have changed since Mark posted.

After extensive poking around and trial and error, I stumbled across http://robots.thoughtbot.com/post/23039827914/get-your-callbacks-on-with-factory-girl-3-3, which has a whole discussion about callback support that was added in FactoryGirl 3.3.0 (which was less than six months old at the time of this posting). It first occurred to me to use the before(:create) and after(:create) to set Authorization.current_user, but after I read to the end of the post I realized that to_create was exactly what I needed.

All I had to do was to create the file spec/factories.rb and added the following code to it:

require 'declarative_authorization/maintenance'
include Authorization::TestHelper

FactoryGirl.define do
  to_create do |instance|
    without_access_control { instance.save! }
  end
end

Voila! No monkey patching, and I only have to create one file and all my calls to FactoryGirl.create start working again.

Some Explorations in Functional Programming using Ruby

I’m taking a course in Ruby on Rails, and one of the suggested projects is to write a function that will return the class hierarchy for an object.  This is a very simple problem, but it got me to thinking.  The obvious solution is to use a loop.  But Ruby has a whole set of collection-based idioms using block syntax that seem to inherit from Smalltalk.  I wondered if I might be able to put those to use.

So I started thinking about this whole class of problems.  I wanted a way to express an enumerator where f(n) depends solely on the values of a fixed number of recent values.  For instance, in the case of returning the class hierarchy for an object, f(n) depends solely on f(n-1), whereas in the case of generating the Fibonacci sequence, f(n) depends upon f(n-1) and f(n-2).  One approach to these sort of problems is to use recursion, although in the case of the Fibonacci sequence, recursion can result in performance issues that can be resolved through the use of Memoization.  I wanted to encode the recursive relationship in a non-recursive solution.

So I came up with:

def enumerator_from_generator_simple(*history, &block)
	Enumerator.new do |yielder|
		history.each { |x| yielder.yield x }
		loop do
			history = history[1, history.length-1] + [block.call(*history)]
			yielder.yield history[-1]
		end
	end
end

This takes two things – a list of “seed” values for the history and a block. The block will be passed the history and is expected to generate the next value in the sequence. The number of history values to pass to the block is determined by the number of seed values passed initially. With this method in hand, we can then create an enumerator for the Fibonacci sequence and then extract the first 10 values (be very careful not to try to extract all the values – this is the enumerator for an infinite list):

foo = enumerator_from_generator_simple(1, 1) do
	|*history|
	history[0]+history[1]
end

p foo.first(10)

We can also then write the class hierarchy enumerator:

def get_superclasses(object)
	superclass_enum = enumerator_from_generator_simple(object.class) do |old_class|
		new_class = old_class.superclass
		raise StopIteration if new_class == nil
		new_class
	end
	superclass_enum.to_a
end

p get_superclasses(5)

Note than in this case, I defined the method by having it create a new enumerator using the class of the passed object as the history. I use raise StopIteration to halt the enumeration once we reach a class whose superclass is nil. Voila! No looping, just an expression of the recursive relationship.

My attention then turned to two different areas. The first was adding some performance improvements to the code. These may be a little excessive, but it struck me that the code listed above is a bit inefficient when dealing with some of the most common cases, such as a history of one element. So I wrote the following, which has hand-tweaked code for the most common cases (no elements, one element, and two elements):

def enumerator_from_generator(*history, &block)
	if history.length == 0 then
		Enumerator.new do |yielder|
			loop do
				yielder.yield block.call
			end
		end

	elsif history.length == 1 then
		history = history[0]
		Enumerator.new do |yielder|
			yielder.yield history
			loop do
				history = block.call(history)
				yielder.yield history
			end
		end

	elsif history.length == 2 then
		history_0 = history[0]
		history_1 = history[1]
		Enumerator.new do |yielder|
			yielder.yield history_0
			yielder.yield history_1
			loop do
				history_0, history_1 = history_1, block.call(history_0, history_1)
				yielder.yield history_1
			end
		end

	else
		Enumerator.new do |yielder|
			history.each { |x| yielder.yield x }
			loop do
				history = history[1, history.length-1] + [block.call(*history)]
				yielder.yield history[-1]
			end
		end

	end
end

With that in hand, I turned to some simple testing. As part of testing, I realized that the Fibonacci sequence is actually a special case of a set of Fibonacci sequences. We’ll call the “traditional” Fibonacci sequence Fib2, where Fib2(n) = Fib2(n-1) + Fib2(n-2). The next one in the set of sequences would be Fib3 = Fib3(n-1) + Fib3(n-2) + Fib3(n-3). And so forth and so on. If one uses history.inject(:+) in place of the explicit addition, it becomes simple to express a whole range of these. Finally, I start generating the enumerator using brace block syntax so I can cram everything on one line.

p ( enumerator_from_generator(1, 1) { |*history| history.inject(:+) } ).first(20)
p ( enumerator_from_generator(1, 1, 1) { |*history| history.inject(:+) } ).first(20)
p ( enumerator_from_generator(1, 1, 1, 1) { |*history| history.inject(:+) } ).first(20)
p ( enumerator_from_generator(1, 1, 1, 1, 1) { |*history| history.inject(:+) } ).first(20)

I just increase the number of elements passed initially and, voila, Fibx!

Then I got to thinking that all of this addition was a little excessive. In the case of two or three elements, it’s not bad. But what if I was computing Fib1000? The vast majority of the addition in each step was done in the previous step. I realized that Fibx(n) = 2 * Fibx(n-1) – Fibx(n-1-x). That is to say, each time the very oldest history element drops off and we add a new Fibx(n-1). But since Fibx(n-1) is equal to the whole sum without the changes, we can just double it and subtract the oldest element. Of course, then we need to keep track of x+1 elements in our history and we need to seed the list with an initial sum, but the following enumerates Fib5 just as well as the last example above:

p ( enumerator_from_generator(1, 1, 1, 1, 1, 5) { |*history| history[-1] * 2 - history[0] } ).first(20)

My final realization was that Fibx(n)/Fibx(n-1) has an asymptotic limit – specifically, if the limit as n goes to infinity of Fibx(n)/Fibx(n-1) is r, then r = 2 – 1 / r^x, which can be recast as r^(x+1) – 2 r^x + 1 = 0. For instance, in the case of Fib2, that is r^3 – 2 r^2 + 1 = 0. There are functions for computing cubic roots, but I’m lazy, so I’ll just take advantage of enumerator_from_generator to write a numeric solver!

p ( enumerator_from_generator(2.0) { |history| 2.0 - 1.0 / (history**2) } ).first(20)

Hey, that converges pretty quickly on the Golden Mean! Which means we’re getting the right answer, since it is well documented that the ratio of consecutive values in the Fibonacci sequence converges on the Golden Mean.

But what if I want to enumerate reasonable converged values for Fibx?

def fibx_ratio(x, iter)
	( ( enumerator_from_generator(2.0) { |history| 2.0 - 1.0 / (history**x) } ).first(iter) )[-1]
end

1.upto(10) { |x| p fibx_ratio(x, 100) }

And, as we expect, the ratio for Fibx approaches 2 as x approaches infinity. Which is to say that while Fib2 approximates n^1.618, and Fib3 approximates n^1.839, by the time we get to Fib9 and Fib10, there isn’t much difference between the two functions.

And that, I think, is probably the most I’m going to wring out of this specific suggested project!

RESTful Design and Linguistics

Since emerging from 16 years in a cave, I’ve discovered this design philosophy called REST.  It seems to be popular, although I’m also discovering that there’s a lot of contention around it, and a lot of the descriptions of it seem to miss some of the key points.

I’ve found the following posts to be most helpful in trying to understand the philosophy:

Reading all this stuff got me thinking about REST and how to adapt the OO model that many developers carry around in their heads (nouns and verbs).  The CRUD model maps pretty well onto REST philosophy (although the mapping has some ambiguity in it depending upon the implementation details), but there are a lot of other verbs out there that we need to handle.  The quintessential verb is balance_transfer in a banking application.  The breakthrough came when I realized that verbs can be recast as creating a record of the verb, and the record of the verb is itself a noun.  Thus, instead of executing the balance_transfer action, instead one should create a balance_transfer record.  Creating that record may have side effects (such as updating the balances on the affected accounts).  That record may not even be persisted in the application (although I’d hope that it is in the case of a banking application – audit trails are important).  But from an API standpoint, one is creating a noun, and that is easy to handle in REST.  One way of thinking of this is that in a database with an audit trail that is 100% complete, the current data in the database is simply a persistent cache of the result of committing everything in the audit trail!  As a result, one doesn’t need CRUD at all – one only needs CR on the audit trail!  Every change to the database is simply the result of creating the appropriate record in the audit trail.

All of this collided with a dream I had several months ago.  In the dream I asked a friend what “uninstallable” meant – even in my dream, I’d realized that the word “uninstallable” is ambiguous. It can either mean “unable to be installed” or “able to be uninstalled”!  All excited, I ran to my computer and discovered that someone had written a  paper on the subject - Hierarchical Morphological Structure and Ambiguity (Carl Vikner).  Not only does the paper discuss the ambiguity, but it also brings up a very interesting point. Certain verbs are reversible. For instance, one can unlock a door. One cannot, however, uneat an apple (this is a state change verb, but one cannot undo the state change – the entropy increase is too large) or unyawn (this is a non-state change verb).

So we can divide verbs into three categories:

  • No state change (yawn, wag, etc.)
  • Reversible state change (install, tie, lock, etc.)
  • Irreversible state change (eat, burn, etc.)

In most databases, viewing a record falls into the “no state change” category, but in some databases it falls into one of the latter two categories.  For instance, in a medical records database that complies with HIPAA, viewing a record falls into the “irreversible state change” category because it creates a an audit entry to enable investigations of unauthorized access.  In a mail reader application, viewing (or reading) an e-mail is a reversible state change – it marks the e-mail as read, but there is generally a “Mark as Unread” action.  Note that reversing the action of reading the e-mail has to be referred to by a circumlocution since “unread” is not a valid verb in English!

Which brings up a question.  If viewing has side-effects, should one use GET as the verb?  GET is supposed to be nullipotent, but if viewing creates an audit trail (which is non-idempotent) or marks the record as read (which is idempotent, but not nullipotent), then is GET the appropriate HTTP method?  After some serious thought on this, I’ve concluded the answer is no.  Imagine a browser that does aggressive pre-caching of linked pages.  It can do that because GET is nullipotent.  If, however, viewing records is non-nullipotent, then you want to ensure the browser does not do pre-caching of linked pages.  As a result, all of the links for viewing records should be transitioned to using POST.  This, unfortunately, makes development a lot uglier, but that’s the price one has to pay for ensuring adherence to the browser model.

The next question that occurred to me is how to handle reversible state changes.  For instance, I’m curious how banking systems handle reversing a deposit.  There are two ways of handling this – one is to delete the original deposit record (there might be an audit trail somewhere, but in the transaction log for your account the original deposit simply wouldn’t show).  The other would be to issue a subsequent “undo deposit” transaction.  The latter has the advantage of preserving the original transaction entries as they posted.  The former has the advantage of some degree of simplicity.  However, in both cases there are potential issues that need to be considered.  Imagine I have a balance of $1000 from Jan 1st through Mar 1st.  On Mar 1st, an invalid deposit of $999,000 gets made into my bank account.  On Apr 1st, I finally notice that my account balance seems a tad high and notify my credit union.  They promptly reverse the transaction using either model.  However, I still end up with a higher balance than I would have otherwise!  Why?  Because on Mar 31st, the bank computed interest.  In this hypothetical case, let’s assume I get 1% per quarter (wouldn’t that be nice).  As a result, I get $3,340 in interest on my average balance of $334,000 for the quarter.  If the invalid deposit had never been made, I would have received only $10 in interest.  I assume banking applications have ways of handling this – either all balance-dependent transactions following an invalidated transaction are modified, or if an adjustment transaction is inserted on Apr 1st, then adjustment transactions for all of the balance-dependent transactions are inserted as well.  Issues can be observed in the reverse case as well – imagine invalid withdrawals being made that result in negative-balance fees being applied to the account.  Reversing those invalid withdrawals also needs to clean up the negative-balance fees.