Saturday, February 7, 2009

Investigating how Symbol to_proc works

One of the things I love about Ruby is how expressive it is and how with open classes it can be optimized to become even more expressive. Since I started using Ruby I don't think I've written a single for or while loop - something I couldn't have imagined saying with any other language! Of course I do this by using iterators and writing code like


user_names = User.all.collect {|user| user.name}


I recently started discovered I could write the same thing even more concisely (as long as I'm using Rails or Ruby 1.9)


user_names = User.all.collect(&:name)


I decided to investigate how this works.

First I found some good posts by Prag Dave and Ryan Bates and at InfoQ. This helped but I still didn't understand it all so decided to dig further.

First I took a look at how Rails extends Symbol


unless :to_proc.respond_to?(:to_proc)
class Symbol
# Turns the symbol into a simple proc, which is especially useful for enumerations. Examples:
#
# # The same as people.collect { |p| p.name }
# people.collect(&:name)
#
# # The same as people.select { |p| p.manager? }.collect { |p| p.salary }
# people.select(&:manager?).collect(&:salary)
def to_proc
Proc.new { |*args| args.shift.__send__(self, *args) }
end
end
end


So they defined the to_proc method on symbol and that means the new code will be called when we write &:name because it magically gets transformed into :name.to_proc. I learned something but still needed to learn more to understand how it all works.

Why does the & cause Ruby to call to_proc? I knew that & in the last parameter declaration will pass a provided block as a parameter but this seems to be doing the reverse. Calling a method as an argument but having it interpreted as a block. I tried a couple of experiments in irb


def was_block_given?
block_given?
end

# As expected
was_block_given? {}
=> true

# Passing a proc is not the same as having a block
was_block_given? Proc.new{}
ArgumentError: wrong number of arguments (1 for 0)
from (irb):195:in `was_block_given?'
from (irb):195

# Prefixing the proc with an & makes it like a block
was_block_given? &Proc.new{}
=> true


It was not all as I expected but some reading through the PickAxe book led me to a better understanding. I found this paragraph in the Calling A Method section (page 115 in my copy)

If the last argument to a method is preceded by an ampersand, Ruby assumes that it is a Proc object.
It removes it from the parameter list, converts the Proc object into a block, and associates it with the method.


Ok so now I know why when Ruby sees User.all.collect(&:name) it invokes the collect method with name.to_proc as a block. Next, it was time to figure out why the code Rails put in the to_proc method worked. I took a look at the Rubinius implementation Enumerable


def collect
ary = []
if block_given?
each { |o| ary << yield(o) }
else
each { |o| ary << o }
end
ary
end


Again I decided to experiment with irb to see what each part of the to_proc implementation was doing. First I redefined the Symbol to_proc again with a puts so I could confirm what was going on.


class Symbol
# Turns the symbol into a simple proc, which is especially useful for enumerations. Examples:
#
# # The same as people.collect { |p| p.name }
# people.collect(&:name)
#
# # The same as people.select { |p| p.manager? }.collect { |p| p.salary }
# people.select(&:manager?).collect(&:salary)
def to_proc
Proc.new do |*args|
puts "to_proc args: #{args.inspect}"
args_shift = args.shift
puts "to_proc: #{args_shift.inspect}.__send__(#{self.inspect}, *#{args.inspect})"
result = args_shift.__send__(self, *args)
puts "to_proc result: #{result.inspect}"
result
end
end
end

# Make the call and see what happens
[1].collect( &:to_s)
# to_proc args: [1]
# to_proc: 1.__send__(:to_s, *[])
# to_proc result: "1"
# => ["1"]


Its brute force but tells us everything we need to know. As expected collect yields to our proc/block with the element in a variable length argument [1], it extracts the 1 and sends it the to_s method with no arguments returning the string "1". At this point I think I understand how it all works and decide to confirm by running a few more (more complicated) tests in irb


[1, 'hi', :b].collect( &:to_s)
# to_proc args: [1]
# to_proc: 1.__send__(:to_s, *[])
# to_proc result: "1"
# to_proc args: ["hi"]
# to_proc: "hi".__send__(:to_s, *[])
# to_proc result: "hi"
# to_proc args: [:b]
# to_proc: :b.__send__(:to_s, *[])
# to_proc result: "b"
# => ["1", "hi", "b"]

[1,2,3].inject(&:+)
# to_proc args: [1, 2]
# to_proc: 1.__send__(:+, *[2])
# to_proc result: 3
# to_proc args: [3, 3]
# to_proc: 3.__send__(:+, *[3])
# to_proc result: 6
# => 6

1.__send__(:+, *[2])
# => 3

:b.__send__(:to_s, *[])
# => "b"


It all works as expected and I decide I know as much as I need to about this and call it a day.

So why did I bother figuring all this out and then writing it up? Mostly because I didn't know how it worked and thought there was some 'magic' going on. I could have continued using this feature without understanding how it worked but now that I understand how it works if some need ever arises for me to do some similar magic I know how to go about it. As for writing it up I hope someone else may read this and find it useful but by I increased my own understanding through the act of writing.

0 comments: