Some Explorations in Functional Programming using Ruby

I’m taking a course in Ruby on Rails, and one of the suggested projects is to write a function that will return the class hierarchy for an object.  This is a very simple problem, but it got me to thinking.  The obvious solution is to use a loop.  But Ruby has a whole set of collection-based idioms using block syntax that seem to inherit from Smalltalk.  I wondered if I might be able to put those to use.

So I started thinking about this whole class of problems.  I wanted a way to express an enumerator where f(n) depends solely on the values of a fixed number of recent values.  For instance, in the case of returning the class hierarchy for an object, f(n) depends solely on f(n-1), whereas in the case of generating the Fibonacci sequence, f(n) depends upon f(n-1) and f(n-2).  One approach to these sort of problems is to use recursion, although in the case of the Fibonacci sequence, recursion can result in performance issues that can be resolved through the use of Memoization.  I wanted to encode the recursive relationship in a non-recursive solution.

So I came up with:

def enumerator_from_generator_simple(*history, &block)
	Enumerator.new do |yielder|
		history.each { |x| yielder.yield x }
		loop do
			history = history[1, history.length-1] + [block.call(*history)]
			yielder.yield history[-1]
		end
	end
end

This takes two things – a list of “seed” values for the history and a block. The block will be passed the history and is expected to generate the next value in the sequence. The number of history values to pass to the block is determined by the number of seed values passed initially. With this method in hand, we can then create an enumerator for the Fibonacci sequence and then extract the first 10 values (be very careful not to try to extract all the values – this is the enumerator for an infinite list):

foo = enumerator_from_generator_simple(1, 1) do
	|*history|
	history[0]+history[1]
end

p foo.first(10)

We can also then write the class hierarchy enumerator:

def get_superclasses(object)
	superclass_enum = enumerator_from_generator_simple(object.class) do |old_class|
		new_class = old_class.superclass
		raise StopIteration if new_class == nil
		new_class
	end
	superclass_enum.to_a
end

p get_superclasses(5)

Note than in this case, I defined the method by having it create a new enumerator using the class of the passed object as the history. I use raise StopIteration to halt the enumeration once we reach a class whose superclass is nil. Voila! No looping, just an expression of the recursive relationship.

My attention then turned to two different areas. The first was adding some performance improvements to the code. These may be a little excessive, but it struck me that the code listed above is a bit inefficient when dealing with some of the most common cases, such as a history of one element. So I wrote the following, which has hand-tweaked code for the most common cases (no elements, one element, and two elements):

def enumerator_from_generator(*history, &block)
	if history.length == 0 then
		Enumerator.new do |yielder|
			loop do
				yielder.yield block.call
			end
		end

	elsif history.length == 1 then
		history = history[0]
		Enumerator.new do |yielder|
			yielder.yield history
			loop do
				history = block.call(history)
				yielder.yield history
			end
		end

	elsif history.length == 2 then
		history_0 = history[0]
		history_1 = history[1]
		Enumerator.new do |yielder|
			yielder.yield history_0
			yielder.yield history_1
			loop do
				history_0, history_1 = history_1, block.call(history_0, history_1)
				yielder.yield history_1
			end
		end

	else
		Enumerator.new do |yielder|
			history.each { |x| yielder.yield x }
			loop do
				history = history[1, history.length-1] + [block.call(*history)]
				yielder.yield history[-1]
			end
		end

	end
end

With that in hand, I turned to some simple testing. As part of testing, I realized that the Fibonacci sequence is actually a special case of a set of Fibonacci sequences. We’ll call the “traditional” Fibonacci sequence Fib2, where Fib2(n) = Fib2(n-1) + Fib2(n-2). The next one in the set of sequences would be Fib3 = Fib3(n-1) + Fib3(n-2) + Fib3(n-3). And so forth and so on. If one uses history.inject(:+) in place of the explicit addition, it becomes simple to express a whole range of these. Finally, I start generating the enumerator using brace block syntax so I can cram everything on one line.

p ( enumerator_from_generator(1, 1) { |*history| history.inject(:+) } ).first(20)
p ( enumerator_from_generator(1, 1, 1) { |*history| history.inject(:+) } ).first(20)
p ( enumerator_from_generator(1, 1, 1, 1) { |*history| history.inject(:+) } ).first(20)
p ( enumerator_from_generator(1, 1, 1, 1, 1) { |*history| history.inject(:+) } ).first(20)

I just increase the number of elements passed initially and, voila, Fibx!

Then I got to thinking that all of this addition was a little excessive. In the case of two or three elements, it’s not bad. But what if I was computing Fib1000? The vast majority of the addition in each step was done in the previous step. I realized that Fibx(n) = 2 * Fibx(n-1) – Fibx(n-1-x). That is to say, each time the very oldest history element drops off and we add a new Fibx(n-1). But since Fibx(n-1) is equal to the whole sum without the changes, we can just double it and subtract the oldest element. Of course, then we need to keep track of x+1 elements in our history and we need to seed the list with an initial sum, but the following enumerates Fib5 just as well as the last example above:

p ( enumerator_from_generator(1, 1, 1, 1, 1, 5) { |*history| history[-1] * 2 - history[0] } ).first(20)

My final realization was that Fibx(n)/Fibx(n-1) has an asymptotic limit – specifically, if the limit as n goes to infinity of Fibx(n)/Fibx(n-1) is r, then r = 2 – 1 / r^x, which can be recast as r^(x+1) – 2 r^x + 1 = 0. For instance, in the case of Fib2, that is r^3 – 2 r^2 + 1 = 0. There are functions for computing cubic roots, but I’m lazy, so I’ll just take advantage of enumerator_from_generator to write a numeric solver!

p ( enumerator_from_generator(2.0) { |history| 2.0 - 1.0 / (history**2) } ).first(20)

Hey, that converges pretty quickly on the Golden Mean! Which means we’re getting the right answer, since it is well documented that the ratio of consecutive values in the Fibonacci sequence converges on the Golden Mean.

But what if I want to enumerate reasonable converged values for Fibx?

def fibx_ratio(x, iter)
	( ( enumerator_from_generator(2.0) { |history| 2.0 - 1.0 / (history**x) } ).first(iter) )[-1]
end

1.upto(10) { |x| p fibx_ratio(x, 100) }

And, as we expect, the ratio for Fibx approaches 2 as x approaches infinity. Which is to say that while Fib2 approximates n^1.618, and Fib3 approximates n^1.839, by the time we get to Fib9 and Fib10, there isn’t much difference between the two functions.

And that, I think, is probably the most I’m going to wring out of this specific suggested project!

Advertisements

RESTful Design and Linguistics

Since emerging from 16 years in a cave, I’ve discovered this design philosophy called REST.  It seems to be popular, although I’m also discovering that there’s a lot of contention around it, and a lot of the descriptions of it seem to miss some of the key points.

I’ve found the following posts to be most helpful in trying to understand the philosophy:

Reading all this stuff got me thinking about REST and how to adapt the OO model that many developers carry around in their heads (nouns and verbs).  The CRUD model maps pretty well onto REST philosophy (although the mapping has some ambiguity in it depending upon the implementation details), but there are a lot of other verbs out there that we need to handle.  The quintessential verb is balance_transfer in a banking application.  The breakthrough came when I realized that verbs can be recast as creating a record of the verb, and the record of the verb is itself a noun.  Thus, instead of executing the balance_transfer action, instead one should create a balance_transfer record.  Creating that record may have side effects (such as updating the balances on the affected accounts).  That record may not even be persisted in the application (although I’d hope that it is in the case of a banking application – audit trails are important).  But from an API standpoint, one is creating a noun, and that is easy to handle in REST.  One way of thinking of this is that in a database with an audit trail that is 100% complete, the current data in the database is simply a persistent cache of the result of committing everything in the audit trail!  As a result, one doesn’t need CRUD at all – one only needs CR on the audit trail!  Every change to the database is simply the result of creating the appropriate record in the audit trail.

All of this collided with a dream I had several months ago.  In the dream I asked a friend what “uninstallable” meant – even in my dream, I’d realized that the word “uninstallable” is ambiguous. It can either mean “unable to be installed” or “able to be uninstalled”!  All excited, I ran to my computer and discovered that someone had written a  paper on the subject – Hierarchical Morphological Structure and Ambiguity (Carl Vikner).  Not only does the paper discuss the ambiguity, but it also brings up a very interesting point. Certain verbs are reversible. For instance, one can unlock a door. One cannot, however, uneat an apple (this is a state change verb, but one cannot undo the state change – the entropy increase is too large) or unyawn (this is a non-state change verb).

So we can divide verbs into three categories:

  • No state change (yawn, wag, etc.)
  • Reversible state change (install, tie, lock, etc.)
  • Irreversible state change (eat, burn, etc.)

In most databases, viewing a record falls into the “no state change” category, but in some databases it falls into one of the latter two categories.  For instance, in a medical records database that complies with HIPAA, viewing a record falls into the “irreversible state change” category because it creates a an audit entry to enable investigations of unauthorized access.  In a mail reader application, viewing (or reading) an e-mail is a reversible state change – it marks the e-mail as read, but there is generally a “Mark as Unread” action.  Note that reversing the action of reading the e-mail has to be referred to by a circumlocution since “unread” is not a valid verb in English!

Which brings up a question.  If viewing has side-effects, should one use GET as the verb?  GET is supposed to be nullipotent, but if viewing creates an audit trail (which is non-idempotent) or marks the record as read (which is idempotent, but not nullipotent), then is GET the appropriate HTTP method?  After some serious thought on this, I’ve concluded the answer is no.  Imagine a browser that does aggressive pre-caching of linked pages.  It can do that because GET is nullipotent.  If, however, viewing records is non-nullipotent, then you want to ensure the browser does not do pre-caching of linked pages.  As a result, all of the links for viewing records should be transitioned to using POST.  This, unfortunately, makes development a lot uglier, but that’s the price one has to pay for ensuring adherence to the browser model.

The next question that occurred to me is how to handle reversible state changes.  For instance, I’m curious how banking systems handle reversing a deposit.  There are two ways of handling this – one is to delete the original deposit record (there might be an audit trail somewhere, but in the transaction log for your account the original deposit simply wouldn’t show).  The other would be to issue a subsequent “undo deposit” transaction.  The latter has the advantage of preserving the original transaction entries as they posted.  The former has the advantage of some degree of simplicity.  However, in both cases there are potential issues that need to be considered.  Imagine I have a balance of $1000 from Jan 1st through Mar 1st.  On Mar 1st, an invalid deposit of $999,000 gets made into my bank account.  On Apr 1st, I finally notice that my account balance seems a tad high and notify my credit union.  They promptly reverse the transaction using either model.  However, I still end up with a higher balance than I would have otherwise!  Why?  Because on Mar 31st, the bank computed interest.  In this hypothetical case, let’s assume I get 1% per quarter (wouldn’t that be nice).  As a result, I get $3,340 in interest on my average balance of $334,000 for the quarter.  If the invalid deposit had never been made, I would have received only $10 in interest.  I assume banking applications have ways of handling this – either all balance-dependent transactions following an invalidated transaction are modified, or if an adjustment transaction is inserted on Apr 1st, then adjustment transactions for all of the balance-dependent transactions are inserted as well.  Issues can be observed in the reverse case as well – imagine invalid withdrawals being made that result in negative-balance fees being applied to the account.  Reversing those invalid withdrawals also needs to clean up the negative-balance fees.

Hello world from Toby Ovod-Everett

I’m a software developer located in Anchorage, Alaska.  I have a strong interest in database front-end development.  While I find web front-ends to have some serious limitations from a UI perspective, I’ve been convinced that the dramatic simplification of deployment and updating outweigh those disadvantages.

Historically, I’ve worked for the last 16 years doing automated workstation management, including building my own framework for deploying software under SMS, developing my own tool sets for package and install/update routine validation, etc.  That was a 16 year detour from my primary interest, though, and I’m excited to be back on track.

Along the way I developed an OO framework for developing ASP web frontends for SQL Server databases using Perl.  The logic tier and most of the presentation tier were encapsulated in an object layer that implemented the database access, business logic, and a majority of the HTML generation and ASP interaction.  That was skinned by very thin ASP pages.  In order to facilitate rapid development and maintenance, a custom data dictionary was developed in a separate SQL Server database.  Perl scripts accessed the data dictionary to do code generation for the 400KB of T-SQL code that defined the database structure, as well as for the Win32::ASP::DB object layer and the ASP pages.  Almost all of the database-specific code was stored in the data dictionary and was injected during the code generation.  In addition, the code generators supported a variety of callbacks stored in the data dictionary that manipulated and extended the structure early in the code generation process, thus permitting repetitive entries to be automatically generated.

As part of that, Ned Konz and I co-developed Class::Prototyped for Perl.  Class::Prototyped is a module that supports fast prototype-based OO programming in Perl.  The semantics came straight out of the Self language, but we stuck to Perlish syntax as much as possible.  It has support for OO reflection and meta-programming, both on objects built using Class::Prototyped as well as arbitrary existing Perl classes.

My long-term interests include:

  • Database design
  • Front-end design
  • Meta-programming
  • DRY

 

However, I’ve come to the conclusion that I need to stop developing custom frameworks from scratch.  So I’ve thrown in my hat with the Ruby on Rails community, and if I find itches in RoR that need scratching, I’ll start writing and coding.