One of the great things about Ruby is that it is truely polymorphic. One of the great things about Rails is it works so well with Ruby. So Rails should be the logical choice if you want to write a web application dealing with polymorphic data, right?
Suppose, for example, you want to have a table with two PayloadColumns, one containing a Ruby expression (text) and the other containing the results of evaluating that expression.
Rails provides a serialize directive that uses something like YAML to store objects of any type into a text field. This looks like it should be exactly what we want. But it’s not.
The decision of what objects to convert to_yaml (rather than store natively) is relegated to the connection object, while the unpacking is done in the base class. This results in a very inconsistent (and possibly SQL-flavor dependent) behaviour. Quick tests saving and restoring various objects showed:
Trying to think out what I wanted, I made the following list of desiderata:
I also made up a table of how I’d expect various values to be stored:
| “milk” | “milk” |
| :eggs | :eggs |
| 7 | 7 |
| “7” | “7” |
| 9..5 | 9..5 |
...and at some point I dimly began to realize that I was writing the same thing in both columns. This crystalized into what might (for some people) be a very elegant solution: the storage format is simply a text string which, if evaluated, would give you the object you want.
| [5, “cat”, :monkey] |
| Student.find(45327) |
| Teacher.find_by_name(‘H. Knocks’) |
| ...and so forth |
This could easily be implemented by adding a “to_source” method to all objects (defaulting to something that returns “YAML::load(#{self.to_yaml.inspect})” for cases where we don’t override it with something sweeter). For decoding the data, we just have to “eval” it.
The problem with this method is that if anyone could manage to inject something funny in my database, they would now have a way to execute arbitrary Ruby code in my rails application. While I can immediately think of three reasons why this wouldn’t be a problem, none of them are compelling enough to make the little voice stop chanting “you’ll be sorry”, so I moved on.
The problem of course isn’t with the storage format, it’s with blindly evaling things. So if we explicitly list the formats we are using, we can (for a little more work) have much more security.
There is a module called ActiveRecord::Wrapping, the header of which says it is:
A plugin framework for wrapping attribute values before they go in and unwrapping them after they go out of the database. This was intended primarily for YAML wrapping of arrays and hashes, but this behavior is now native in the Base class. So for now this framework is laying dormant until a need pops up.
It looked ideal for what I wanted; a little code-walking suggested that all I needed to do to use it was implement a wrapper class that implemented the wrap(attribute) and unwrap(attribute) functions I wanted:
require 'active_record/wrappings'
module ActiveRecord
module Wrappings #:nodoc:
class AccurateWrapper < AbstractWrapper #:nodoc:
def wrap(attribute)
case attribute
when String then '"'+attribute+'"'
when Numeric then attribute.to_s
when ActiveRecord::Base
"#{attribute.class}:#{attribute.id}"
else attribute.to_yaml
end
end
def unwrap(attribute)
case attribute
when /^\((.*)\)$/ then $1
when /^"(.*)"$/ then $1
when /^\d*\.\d*$/ then attribute.to_f
when /^\d*$/ then attribute.to_i
when /^([a-zA-Z_]+):(\d+)/
eval($1).find($2.to_i)
else begin
YAML::load(attribute)
rescue Object
attribute
end
end
end
end
module ClassMethods #:nodoc:
# Wraps the attribute in polymorphic encoding
def polymorphic_fields(*attributes)
wrap_with(AccurateWrapper, attributes)
end
end
end
end
Using which, I could write:
class Example < ActiveRecord::Base
include ActiveRecord::Wrappings
has_and_belongs_to_many :lessons
polymorphic_fields :result
end
Of course, it didn’t work right off the bat. It appears the Wrappings module isn’t getting updated since it isn’t getting used. So (for my version of ActiveRecord, 1.11.1) I had to fix a few things:
My patches
--- /usr/local/lib/ruby/gems/1.8/gems/activerecord-1.11.1-mqr/lib/active_record/wrappings.rb 2005-03-29 20:41:03.000000000 -0600
+++ wrappings.rb 2005-08-11 20:25:02.000000000 -0600
@@ -5,9 +5,11 @@
module Wrappings #:nodoc:
module ClassMethods #:nodoc:
def wrap_with(wrapper, *attributes)
- [ attributes ].flat.each { |attribute| wrapper.wrap(attribute) }
+ [ attributes ].flatten.each { |attribute| wrapper.wrap(attribute,binding) }
end
end
+ def after_find
+ end
def self.append_features(base)
super
@@ -16,8 +18,9 @@
class AbstractWrapper #:nodoc:
def self.wrap(attribute, record_binding) #:nodoc:
- %w( before_save after_save after_initialize ).each do |callback|
+ %w( before_save after_save after_initialize after_find ).each do |callback|
eval "#{callback} #{name}.new('#{attribute}')", record_binding
end
end
@@ -47,6 +52,7 @@
alias_method :before_save, :save_wrapped_attribute #:nodoc:
alias_method :after_save, :load_wrapped_attribute #:nodoc:
+ alias_method :after_find, :load_wrapped_attribute #:nodoc:
alias_method :after_initialize, :after_save #:nodoc:
# Overwrite to implement the logic that'll take the regular attribute and wrap it.
This isn’t perfect (I need to handle dates better, and reals with an exponent, etc.) but it gives me a good base on which to build. For example, I don’t handle collections very well yet—an array of ActiveRecords?, for example, will still be passed to YAML and consequently come back as dead clones. It should be easy to see how it could be extended as needed if any of these cases matter.
It also imposes a slight overhead due to the use of the after_find callback, unnoticable in my application but possibly objectionable in a much larger application.
It has the strong advantage that the only footprint in my application is closely associated with the declaration of the polymorphic coloumns, making it easy to change to a better solution if anyone posts one.
I have created a webcrawler using ruby and activerecord, and in it I store parsed webpages, since I am not really interested in the raw HTML after having parsed it.
To store and load this, I started using Yaml, and realized that what I got back was b0rken.
Then I tried Marshal, and realized that REXML sometimes uses singeltons, which breaks Marshal.
So I ended up using something like
#
# Before we save, let the content_serialized-field be defined
# by either a serialized version of @content, or if that doesnt work
# (due to it containing a singleton - rexml does that sometimes?)
# - save it instead as pure xml
#
def before_save
begin
self[:content_serialized] = Base64.encode64(Marshal.dump(@content))
rescue Exception => e
warn("error dumping #{url}: " + e.inspect + ", dumping pure XML instead")
self[:content_serialized] = ""
@content.write(self[:content_serialized])
end
end
#
# After we have found a page, unserialize the @content from content_serialized
# If it breaks, assume that we made the decision in before_save to store pure XML instead
# and so load it as a new REXML::Document
#
def after_find
begin
@content = Marshal.load(Base64.decode64(self[:content_serialized]))
rescue Exception => e
warn("error loading #{url}, loading as pure XML instead")
@content = REXML::Document.new(self[:content_serialized])
end
end
Perhaps this could be extended to a more generic case..
//MartinKihlgren, martin a t troja.ath.cx
One of the great things about Ruby is that it is truely polymorphic. One of the great things about Rails is it works so well with Ruby. So Rails should be the logical choice if you want to write a web application dealing with polymorphic data, right?
Suppose, for example, you want to have a table with two PayloadColumns, one containing a Ruby expression (text) and the other containing the results of evaluating that expression.
Rails provides a serialize directive that uses something like YAML to store objects of any type into a text field. This looks like it should be exactly what we want. But it’s not.
The decision of what objects to convert to_yaml (rather than store natively) is relegated to the connection object, while the unpacking is done in the base class. This results in a very inconsistent (and possibly SQL-flavor dependent) behaviour. Quick tests saving and restoring various objects showed:
Trying to think out what I wanted, I made the following list of desiderata:
I also made up a table of how I’d expect various values to be stored:
| “milk” | “milk” |
| :eggs | :eggs |
| 7 | 7 |
| “7” | “7” |
| 9..5 | 9..5 |
...and at some point I dimly began to realize that I was writing the same thing in both columns. This crystalized into what might (for some people) be a very elegant solution: the storage format is simply a text string which, if evaluated, would give you the object you want.
| [5, “cat”, :monkey] |
| Student.find(45327) |
| Teacher.find_by_name(‘H. Knocks’) |
| ...and so forth |
This could easily be implemented by adding a “to_source” method to all objects (defaulting to something that returns “YAML::load(#{self.to_yaml.inspect})” for cases where we don’t override it with something sweeter). For decoding the data, we just have to “eval” it.
The problem with this method is that if anyone could manage to inject something funny in my database, they would now have a way to execute arbitrary Ruby code in my rails application. While I can immediately think of three reasons why this wouldn’t be a problem, none of them are compelling enough to make the little voice stop chanting “you’ll be sorry”, so I moved on.
The problem of course isn’t with the storage format, it’s with blindly evaling things. So if we explicitly list the formats we are using, we can (for a little more work) have much more security.
There is a module called ActiveRecord::Wrapping, the header of which says it is:
A plugin framework for wrapping attribute values before they go in and unwrapping them after they go out of the database. This was intended primarily for YAML wrapping of arrays and hashes, but this behavior is now native in the Base class. So for now this framework is laying dormant until a need pops up.
It looked ideal for what I wanted; a little code-walking suggested that all I needed to do to use it was implement a wrapper class that implemented the wrap(attribute) and unwrap(attribute) functions I wanted:
require 'active_record/wrappings'
module ActiveRecord
module Wrappings #:nodoc:
class AccurateWrapper < AbstractWrapper #:nodoc:
def wrap(attribute)
case attribute
when String then '"'+attribute+'"'
when Numeric then attribute.to_s
when ActiveRecord::Base
"#{attribute.class}:#{attribute.id}"
else attribute.to_yaml
end
end
def unwrap(attribute)
case attribute
when /^\((.*)\)$/ then $1
when /^"(.*)"$/ then $1
when /^\d*\.\d*$/ then attribute.to_f
when /^\d*$/ then attribute.to_i
when /^([a-zA-Z_]+):(\d+)/
eval($1).find($2.to_i)
else begin
YAML::load(attribute)
rescue Object
attribute
end
end
end
end
module ClassMethods #:nodoc:
# Wraps the attribute in polymorphic encoding
def polymorphic_fields(*attributes)
wrap_with(AccurateWrapper, attributes)
end
end
end
end
Using which, I could write:
class Example < ActiveRecord::Base
include ActiveRecord::Wrappings
has_and_belongs_to_many :lessons
polymorphic_fields :result
end
Of course, it didn’t work right off the bat. It appears the Wrappings module isn’t getting updated since it isn’t getting used. So (for my version of ActiveRecord, 1.11.1) I had to fix a few things:
My patches
--- /usr/local/lib/ruby/gems/1.8/gems/activerecord-1.11.1-mqr/lib/active_record/wrappings.rb 2005-03-29 20:41:03.000000000 -0600
+++ wrappings.rb 2005-08-11 20:25:02.000000000 -0600
@@ -5,9 +5,11 @@
module Wrappings #:nodoc:
module ClassMethods #:nodoc:
def wrap_with(wrapper, *attributes)
- [ attributes ].flat.each { |attribute| wrapper.wrap(attribute) }
+ [ attributes ].flatten.each { |attribute| wrapper.wrap(attribute,binding) }
end
end
+ def after_find
+ end
def self.append_features(base)
super
@@ -16,8 +18,9 @@
class AbstractWrapper #:nodoc:
def self.wrap(attribute, record_binding) #:nodoc:
- %w( before_save after_save after_initialize ).each do |callback|
+ %w( before_save after_save after_initialize after_find ).each do |callback|
eval "#{callback} #{name}.new('#{attribute}')", record_binding
end
end
@@ -47,6 +52,7 @@
alias_method :before_save, :save_wrapped_attribute #:nodoc:
alias_method :after_save, :load_wrapped_attribute #:nodoc:
+ alias_method :after_find, :load_wrapped_attribute #:nodoc:
alias_method :after_initialize, :after_save #:nodoc:
# Overwrite to implement the logic that'll take the regular attribute and wrap it.
This isn’t perfect (I need to handle dates better, and reals with an exponent, etc.) but it gives me a good base on which to build. For example, I don’t handle collections very well yet—an array of ActiveRecords?, for example, will still be passed to YAML and consequently come back as dead clones. It should be easy to see how it could be extended as needed if any of these cases matter.
It also imposes a slight overhead due to the use of the after_find callback, unnoticable in my application but possibly objectionable in a much larger application.
It has the strong advantage that the only footprint in my application is closely associated with the declaration of the polymorphic coloumns, making it easy to change to a better solution if anyone posts one.
I have created a webcrawler using ruby and activerecord, and in it I store parsed webpages, since I am not really interested in the raw HTML after having parsed it.
To store and load this, I started using Yaml, and realized that what I got back was b0rken.
Then I tried Marshal, and realized that REXML sometimes uses singeltons, which breaks Marshal.
So I ended up using something like
#
# Before we save, let the content_serialized-field be defined
# by either a serialized version of @content, or if that doesnt work
# (due to it containing a singleton - rexml does that sometimes?)
# - save it instead as pure xml
#
def before_save
begin
self[:content_serialized] = Base64.encode64(Marshal.dump(@content))
rescue Exception => e
warn("error dumping #{url}: " + e.inspect + ", dumping pure XML instead")
self[:content_serialized] = ""
@content.write(self[:content_serialized])
end
end
#
# After we have found a page, unserialize the @content from content_serialized
# If it breaks, assume that we made the decision in before_save to store pure XML instead
# and so load it as a new REXML::Document
#
def after_find
begin
@content = Marshal.load(Base64.decode64(self[:content_serialized]))
rescue Exception => e
warn("error loading #{url}, loading as pure XML instead")
@content = REXML::Document.new(self[:content_serialized])
end
end
Perhaps this could be extended to a more generic case..
//MartinKihlgren, martin a t troja.ath.cx