Ruby under a microscope - How Ruby borrowed a decades-old idea from Lisp
How Ruby borrowed a decades-old idea from Lisp Blocks are one of the most commonly used and powerful features of Ruby, they allow code snipped to Enumerable methods such as each detect.
Closures in Ruby
Internally Ruby represents each block using a C structure called rb_block_t
. Each block consists of a piece of Ruby code, internally compiled YARV bytecode instructions. The rb_block_t structure must contain a pointer to the code, a pointer to a snippet of YARV instructions, iseq
pointer. Blocks can access values from the code surrounding them.
How Ruby calls a block
str = "The quick brown fox"
10.times do
str2 = "jumps over the lazy dog."
puts "#{str} #{str2}"
end
YARV stores the local var str on its internal stack, tracks the location of str using EP(environment pointer) located into the current rb_control_frame_t
structure. Before executing the iteration, before calling times
Ruby creates and initializes a new rb_block_t
structure. Ruby creates this now because block is really just another argument to the times method. When creating a new block structure Ruby copies the current value of EP into the new block. Ruby saves the location of the current stack frame in the new block. Next Ruby calls the times method on Fixnum class, doing this YARV creates a new frame on its internal stack. Now we have 2 stack frames one stack frame for the fixnum.time and below the original stack frame used by the top level function. Ruby pushes a third frame on the top of the stack for the code inside the blok to use. While creating the third stack frame Ruby internal yield code copies the EP from the block into the new stack frame. This way the code inside the block can access both it’s local vars.
EP is the basis of Ruby’s implementation of closures.
Closure is defined as a lambda expression which is a function that takes a set of arguments, and an environment to be used when calling the lambda, definition by Sussman and Steele in 1975.
In Ruby would be iseq pointer to a lambda expression, a function or code snippet and EP the pointer to the environment to be used when calling that lambda function, a pointer to the surrounding stack frame.
Side note about what lambdas semnification from wikipedia, in mathematical logic and computer science it is used for anonymous functions.
typedef struct rb_block_struct {
VALUE self;
VALUE klass;
VALUE *ep;
rb_iseq_t *iseq;
VALUE proc;
} rb_block_t;
- self the value the self pointer had when the block was first referred to. Ruby executes block code inside the same object context that the code had outside the block.
- klass Keep the track of the class of the current object
- proc Use this value when it creates a proc object from a block
The structure of rb_control_frame_struct
contains all values from rb_block_struct
for optimizations.
Ruby takes about 71% time to call the block compared with while loop. Block require more complex work creating a new structure, while does not.
Lambdas and Procs: Treating Function as First Class citizen
def message_function
str = "The quick brown fox"
lambda do |animal|
puts "#{str} jumps over the lazy #{animal}."
end
end
function_value = message_function
function_value.call('dog')
Internally Ruby saves data in 2 places, on the stack or on the heap. Stack is where Ruby saves local variables, return values, arguments. Values on the stack are valid as long as the method is running.
Ruby uses the heap to save information that you might need for a while after a particular method returns. Each value on heap is valid as long as there is a reference for it.
Ruby saves only references to data to the stack, the VALUE pointers, for simple integer values symbols and constants. For all other data types the VALUE pointer is a pointer to a C structure containing the actual data, such as RObject. Ruby stores the structures on the heap.
How Ruby creates a Lambda
Ruby allows you to use functions or cod as a data value, saving them into variables passing as arguments. Ruby implemented this as blocks.
When we call lambda Ruby copies the entire contents of the current YARV stack frame into the heap, where RString
is located. Along with the copy of the stack frame Ruby creates 2 objects, an internal environment object rb_env_t
a wrapper for the heap copy of the stack and a Ruby proc object rb_proc_t
structure, this is the actual return value from the mabda key word.
A proc is a kind of Ruby object that wraps up a block, contains an rb_block_t
including the iseq
and EP pointers. Ruby sets the EP to point to the new heap copy of the stack frame.
This structure contains is_labda
flag which tells us if we used the keyword lambda
or proc
The Proc object
Ruby represents Proc as rb_proc_t
structures. A proc is a Rub object, it contains the same information as other Ruby objects including RBasic structure, uses RTypeData with rb_proc_t to represent instances of the proc object. RTypeData
is a wrapper for around a C data structure.