sourcetagsandcodes.com

Ruby 2.1 In Detail

Posted 6 May 2014 by Mat Sadler

This article was originally posted on the globaldev blog, they have kindly allowed me to repost it here. If you’re looking for a Ruby job in London you should check out their jobs page.

Ruby 2.1 is the next significant version of Ruby, having been released on Christmas Day 2013, just 10 months after 2.0.0. It comes with a whole host of changes and improvements, and this post dives in to the details of what’s new.

New versioning policy

With 2.1 Ruby moves to a new versioning scheme based on Semantic Versioning.

The scheme is MAJOR.MINOR.TEENY, so with 2.1.0 the major version is 2, the minor version is 1, and the teeny version is 0. The teeny version number takes over from the patchlevel for minor bug and security fixes. The minor version number will be used for new features that are largely backwards compatible, and major for incompatible changes that can’t be released as a minor.

This means rather than referring to, say, 1.9.3 in general and 1.9.3-p545 specifically it will be 2.1 in general and 2.1.1 specifically.

The plan is to release a new minor version every 12 months, so we can expect to see Ruby 2.2 on Christmas Day 2014.

Required keyword arguments

After being introduced in Ruby 2.0.0 keyword arguments get a small improvement in 2.1. Required keyword arguments allow you to omit the default value for a keyword argument in the method definition, and an error will be raised if they are not given when the method is called.

# length is required
def pad(num, length:, char: "0")
  num.to_s.rjust(length, char)
end

pad(42, length: 6)   #=> "000042"
pad(42)              #=> #<ArgumentError: missing keyword: length>

As you can see in the example above there are some cases where keyword arguments can really help disambiguate which argument is which, but there isn’t any sensible default. Now you don’t have to choose.

String#freeze optimisation

As strings in Ruby are mutable, any string literals must result in a new string each time they are evaluated, e.g.

def env
  "development"
end

# returns new String object on each call
env.object_id   #=> 70329318373020
env.object_id   #=> 70329318372900

This can be quite wasteful, creating and then garbage collecting a lot of objects. To allow you to avoid this, calling #freeze directly on a string literal is special cased to look up the string in a table of frozen strings. This means the same string will be reused

def env
  "development".freeze
end

# returns the same String object on each call
env.object_id   #=> 70365553080120
env.object_id   #=> 70365553080120

Strings literals as keys in Hash literals will also be treated the same, without the need to call #freeze.

a = {"name" => "Arthur"}
b = {"name" => "Ford"}

# same String object used as key in both hashes
a.keys.first.object_id   #=> 70253124073040
b.keys.first.object_id   #=> 70253124073040

During the development of 2.1 this feature started off as a syntax addition, with "string"f resulting in a frozen string. It was decided to switch to the technique of special casing the #freeze call on a literal as it allows for writing code that is backwards and forwards compatible, plus subjectively many people weren’t fond of the new syntax.

def returns the method name as a Symbol

The result of defining a method is no longer nil, instead it’s a symbol of the method’s name. The canonical example of this is making a single method private.

class Client
  def initialize(host, port)
    # ...
  end

  private def do_request(method, path, body, **headers)
    # ...
  end

  def get(path, **headers)
    do_request(:get, path, nil, **headers)
  end
end

It also makes for a nice way of adding method decorators, here’s an example using Module#prepend to wrap before/after calls around a method.

module Around
  def around(method)
    prepend(Module.new do
      define_method(method) do |*args, &block|
        send(:"before_#{method}") if respond_to?(:"before_#{method}", true)
        result = super(*args, &block)
        send(:"after_#{method}") if respond_to?(:"after_#{method}", true)
        result
      end
    end)
    method
  end
end

class Example
  extend Around

  around def call
    puts "call"
  end

  def before_call
    puts "before"
  end

  def after_call
    puts "after"
  end
end

Example.new.call

outputs

before
call
after

The define_method and define_singleton_method methods have also been updated to return symbols rather than their proc arguments.

Rational and Complex literals

Integer (1) and Float (1.0) literals are a given, now we have Rational (1r) and Complex (1i) literals too.

These work really nicely with Ruby’s casting mechanism for mathematical operations, such that a rational number like one third – 1/3 in mathematical notation – can be written 1/3r in Ruby. 3i produces the complex number 0+3i, this means complex numbers can be written in standard mathematical notation, 2+3i produces the complex number 2+3i!

Array/Enumerable #to_h

The many classes that got a #to_h method in Ruby 2.0.0 are now joined by Array and any other class including Enumerable.

[[:id, 42], [:name, "Arthur"]].to_h      #=> {:id=>42, :name=>"Arthur"}

require "set"
Set[[:id, 42], [:name, "Arthur"]].to_h   #=> {:id=>42, :name=>"Arthur"}

This will come in handy with all those Enumerable methods on Hash that return an Array

headers = {"Content-Length" => 42, "Content-Type" => "text/html"}
headers.map {|k, v| [k.downcase, v]}.to_h
#=> {"content-length" => 42, "content-type" => "text/html"}

Fine grained method cache

Previous to 2.1 Ruby used a global method cache, this would be invalidated for all classes when a method was defined, module included, object extended with a module, etc. anywhere in your code. This made some classes – such as OpenStruct – and some techniques – such as exception tagging – unusable for performance reasons.

This is now no longer an issue, Ruby 2.1 uses a method cache based on the class hierarchy, invalidating the cache for only the class in question and any subclasses.

A method has been added to the RubyVM class to return some debugging information on the status of the method cache.

class Foo
end

RubyVM.stat   #=> {:global_method_state=>133, :global_constant_state=>820, :class_serial=>5689}

# setting constant increments :global_constant_state

Foo::Bar = "bar"

RubyVM.stat(:global_constant_state)   #=> 821

# defining instance method increments :class_serial

class Foo
  def foo
  end
end

RubyVM.stat(:class_serial)            #=> 5690

# defining global method increments :global_method_state

def foo
end

RubyVM.stat(:global_method_state)     #=> 134

Exception

Exceptions now have a #cause method that will return the causing exception. The causing exception will automatically be set when you rescue one exception and raise another.

require "socket"

module MyProject
  Error = Class.new(StandardError)
  NotFoundError = Class.new(Error)
  ConnectionError = Class.new(Error)

  def self.get(path)
    response = do_get(path)
    raise NotFoundError, "#{path} not found" if response.code == "404"
    response.body
  rescue Errno::ECONNREFUSED, SocketError => e
    raise ConnectionError
  end
end

begin
  MyProject.get("/example")
rescue MyProject::Error => e
  e         #=> #<MyProject::ConnectionError: MyProject::ConnectionError>
  e.cause   #=> #<Errno::ECONNREFUSED: Connection refused - connect(2) for "example.com" port 80>
end

Currently the causing error isn’t output anywhere, and rescue won’t pay attention to the cause, but just having the cause automatically set should be a great help while debugging.

Exceptions also get the #backtrace_locations method that was curiously missing from 2.0.0. This returns Thread::Backtrace::Location objects rather than strings, giving easier access to the details of the backtrace.

Generational Garbage Collection

Ruby 2.1 introduces a generational garbage collector, this divides all objects into young and old generations. During the marking phase a regular GC run will only look at the young generation, with the old being marked less frequently. Sweeping is done with the same lazy sweeping system introduced in 1.9.3. An object is promoted to the old generation when it survives a young generation run.

If you have objects in the old generation referring to objects in the young generation, but you’re only looking at the young generation it may seem like an object doesn’t have any references, and you might incorrectly GC an in-use object. Write barriers prevent this by adding old generation objects to a ‘remember set’ when they are modified to refer to a young generation object (e.g. old_array.push(young_string)). This ‘remember set’ is then taken in to account when marking the young generation.

Most generational garbage collectors need these write barriers on all objects, but with the many 3rd party C extensions available for Ruby this isn’t possible, so a workaround was devised whereby objects that aren’t write barrier protected (“shady” objects) won’t ever be promoted to the old generation. This isn’t ideal as you won’t get the full benefit of the generational GC, but it does maximise backwards compatibility.

While the marking phase is now a lot faster the write barriers do add some overhead, and any performance gains are very dependant on what exactly your code is doing.

GC

The GC.start method gets two new keyword arguments, full_mark and immediate_sweep. Both of these default to true.

With full_mark set to true both generations are marked, false will only mark the young generation. With immediate_sweep set true a full ‘stop the world’ sweep will be performed, false will perform a lazy sweep, deferred to when it’s required and only sweeping the minimum required.

GC.start # trigger a full GC run
GC.start(full_mark: false) # only collect young generation
GC.start(immediate_sweep: false) # mark only
GC.start(full_mark: false, immediate_sweep: false) # minor GC

The GC.stress debugging option can now be set to an integer flag to control which part of the garbage collector to stress.

GC.stress = true # full GC at every opportunity
GC.stress = 1    # minor marking at every opportunity
GC.stress = 2    # lazy sweep at every opportunity

The output of GC.stat has been updated to include some more details, and the method itself now takes a key argument to return just the value for that key, rather than building and returning the full hash.

GC.stat                    #=> {:count=>6, ... }
GC.stat(:major_gc_count)   #=> 2
GC.stat(:minor_gc_count)   #=> 4

GC also gets a new method latest_gc_info which returns information about the most recent garbage collection run.

GC.latest_gc_info   #=> {:major_by=>:oldgen, :gc_by=>:newobj, :have_finalizer=>false, :immediate_sweep=>false}

GC tuning environment variables

Ruby will pay attention to a whole bunch of new environment variables now when it’s started up, that can be used to tune the behaviour of the garbage collector.

RUBY_GC_HEAP_INIT_SLOTS

This was available before as RUBY_HEAP_MIN_SLOTS. It sets the initial allocation slots, and defaults to 10000.

RUBY_GC_HEAP_FREE_SLOTS

This was also available before, as RUBY_FREE_MIN. It sets the minimum number of slots that should be available after GC. New slots will be allocated will be allocated if GC hasn’t freed up enough. Defaults to 4096.

RUBY_GC_HEAP_GROWTH_FACTOR

Grows the number of allocated slots by the given factor. (next slots number) = (current slots number) * (this factor). The default is 1.8.

RUBY_GC_HEAP_GROWTH_MAX_SLOTS

The maximum number of slots that will be allocated at one time. The default is 0, which means no maximum.

RUBY_GC_MALLOC_LIMIT

This one isn’t new, but it’s worth covering. It is the amount of memory that can be allocated without triggering garbage collection. It defaults to 16 * 1024 * 1024 (16MB).

RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR

The rate at which the malloc_limit grows, the default is 1.4.

RUBY_GC_MALLOC_LIMIT_MAX

The maximum the malloc_limit can reach. Default 32 * 1024 * 1024 (32MB).

RUBY_GC_OLDMALLOC_LIMIT

The amount the old generation can increase by before triggering a full GC. Default is 16 * 1024 * 1024 (16MB).

RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR

The rate at which the old_malloc_limit grows. Default 1.2.

RUBY_GC_OLDMALLOC_LIMIT_MAX

The maximum the old_malloc_limit can reach. Default 128 * 1024 * 1024 (128MB).

ObjectSpace tools to track down memory leaks

Ruby 2.1 adds some more tools to help track down when you’re keeping references to old/large objects and not letting the garbage collector claim them.

We now get a collection of methods to trace object allocations and report on them.

require "objspace"

module Example
  class User
    def initialize(first_name, last_name)
      @first_name, @last_name = first_name, last_name
    end

    def name
      "#{@first_name} #{@last_name}"
    end
  end
end

ObjectSpace.trace_object_allocations do
  obj = Example::User.new("Arthur", "Dent").name
  ObjectSpace.allocation_sourcefile(obj)   #=> "example.rb"
  ObjectSpace.allocation_sourceline(obj)   #=> 10
  ObjectSpace.allocation_class_path(obj)   #=> "Example::User"
  ObjectSpace.allocation_method_id(obj)    #=> :name
  ObjectSpace.allocation_generation(obj)   #=> 6
end

The number returned by allocation_generation is the number of garbage collections that had been run when the object was created. So if this is a small number then the object was created early in the lifetime of the application.

There’s also trace_object_allocations_start and trace_object_allocations_stop as alternatives to trace_object_allocations with a block, and trace_object_allocations_clear to clear recorded allocation data.

Further to this it’s possible to output this information and a little more to a file or string as JSON for further analysis or visualisation.

require "objspace"

ObjectSpace.trace_object_allocations do
  puts ObjectSpace.dump(["foo"].freeze)
end

outputs

{
    "address": "0x007fd122123f40",
    "class": "0x007fd121072098",
    "embedded": true,
    "file": "example.rb",
    "flags": {
        "wb_protected": true
    },
    "frozen": true,
    "generation": 6,
    "length": 1,
    "line": 4,
    "references": [
        "0x007fd122123f68"
    ],
    "type": "ARRAY"
}

You can also use ObjectSpace.dump_all to dump the entire heap.

require "objspace"
ObjectSpace.trace_object_allocations_start
# do things ...
ObjectSpace.dump_all(output: File.open("heap.json", "w"))

Both these methods can be used without activating object allocation tracing, but you’ll get less detail in the output.

Finally there’s ObjectSpace.reachable_objects_from_root which works similarly to ObjectSpace.reachable_objects_from but takes no argument and works from the root instead. There is one slight quirk to this method in that it returns a hash that has been put in to ‘compare by identity’ mode, so you need the exact same string objects that it uses for keys to get anything out of it. Fortunately there is a workaround.

require "objspace"

reachable = ObjectSpace.reachable_objects_from_root
reachable = {}.merge(reachable) # workaround compare_by_identity
reachable["symbols"]   #=> ["freeze", "inspect", "intern", ...

Refinements

Refinements are no longer experimental and won’t generate a warning, they also get a couple of small tweaks to make them more useable.

Along with the top level #using to activate refinements in a file, there is now a Module#using method to activate refinements in a module. However, the effect of ‘using’ a refinement is still lexical, it won’t be active when reopening a module definition.

module NumberQuery
  refine String do
    def number?
      match(/\A(0|-?[1-9][0-9]*)\z/) ? true : false
    end
  end
end

module Example
  using NumberQuery
  "42".number?   #=> true
end

module Example
  "42".number?   #=> #<NoMethodError: undefined method `number?' for "42":String>
end

Refinement definitions are now inherited with Module#include, meaning you can group together a bunch of refinements defined in separate modules to just one, and activate them all with a single using.

module BlankQuery
  refine Object do
    def blank?
      respond_to?(:empty?) ? empty? : false
    end
  end

  refine String do
    def blank?
      strip.length == 0
    end
  end

  refine NilClass do
    def blank?
      true
    end
  end
end

module NumberQuery
  refine Object do
    def number?
      false
    end
  end

  refine String do
    def number?
      match(/\A(0|-?[1-9][0-9]*)\z/) ? true : false
    end
  end

  refine Numeric do
    def number?
      true
    end
  end
end

module Support
  include BlankQuery
  include NumberQuery
end

class User
  using Support
  # ...
  
  def points=(obj)
    raise "points can't be blank" if obj.blank?
    raise "points must be a number" unless obj.number?
    @points = obj
  end
end

String#scrub

String#scrub has been added to Ruby 2.1 to help deal with strings that have ended up with invalid bytes in them.

# create a string that can't be sensibly printed

# 'latin 1' encoded string with accented character
string = "öops".encode("ISO-8859-1")
# misrepresented as UTF-8
string.force_encoding("UTF-8")
# and mixed with a UTF-8 character
string = "¡#{string}!"

You wouldn’t ever create a string like this deliberately (or at least I hope not), but it’s not uncommon for a string that has been through a number of systems to get mangled like this.

Presented with just the end result it’s pretty much impossible to untangle it all, but we can at least get rid of the characters that are now invalid.

# replace with 'replacement character'
string.scrub        #=> "¡�ops!"
# delete
string.scrub("")    #=> "¡ops!"
# replace with chosen character
string.scrub("?")   #=> "¡?ops!"
# yield to a block for custom replacement
# (in this case the invalid bytes as hex)
string.scrub {|bytes| "<#{bytes.unpack("H*").join}>"}   #=> "¡<f6>ops!"

The same result can also be achieved by calling #encoding with the current encoding and invalid: :replace as arguments

string.encode("UTF-8", invalid: :replace)                 #=> "¡�ops!"
string.encode("UTF-8", invalid: :replace, replace: "?")   #=> "¡?ops!"

Bignum/Rational performance improvements

Bignum and Rational now use the GNU Multiple Precision Arithmetic Library (GMP) to improve performance.

$SAFE level 4 removed

Setting $SAFE = 4 was intended to put Ruby in a ‘sandbox’ type mode and allow execution of untrusted code. However it wasn’t terribly effective, required a lot of code scattered all over Ruby, and was almost never used, so it has been removed.

$SAFE = 4   #=> #<ArgumentError: $SAFE=4 is obsolete>

clock_gettime

Ruby now has access to the system’s clock_gettime() function though Process.clock_gettime, this allows easy access to a number of different time values. It must be called with a clock id as the first argument:

Process.clock_gettime(Process::CLOCK_REALTIME)   #=> 1391705719.906066

Supplying Process::CLOCK_REALTIME will give you a unix timestamp as the return value. This will match Time.now.to_f, but as it skips creating a Time instance it’s a little bit quicker.

Another use for Process.clock_gettime is to get access to a monotonic clock, that is a clock that always moves forwards, regardless to adjustments to the system clock. This is perfect for critical timing or benchmarking.

However the monotonic clock value only makes sense when compared to another as the starting reference point is arbitrary.

start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
sleep 1
Process.clock_gettime(Process::CLOCK_MONOTONIC) - start_time   #=> 1.0051147330086678

Another clock useful for benchmarking is CLOCK_PROCESS_CPUTIME_ID, this works similarly to the monotonic clock in that it always advances, and only makes sense when referenced against another cpu time, but it only advances when the cpu has to do any work.

start_time = Process.clock_gettime(Process::CLOCK_PROCESS_CPUTIME_ID)
sleep 1
Process.clock_gettime(Process::CLOCK_PROCESS_CPUTIME_ID) - start_time   #=> 0.005225999999999981

These three clocks, realtime, monotonic, and cpu, should always be available. Depending on your system you may have access to other clocks, check the documentation for the others that might be available.

To check if any of these clocks are supported you can check for the presence of the constant storing its clock id.

Process.const_defined?(:CLOCK_PROCESS_CPUTIME_ID)   #=> true
Process.const_defined?(:CLOCK_THREAD_CPUTIME_ID)    #=> false

There is also a Process.clock_getres method available that can be used to discover the resolution provided by a specific clock.

RubyGems updated

The included version of RubyGems has been updated to 2.2. Gemfile.lock support has been added to its basic Gemfile support, working towards the goal of merging all Bundler features into RubyGems.

The --file (or -g) option to gem install no longer requires a file name for the dependancy file, it will auto-detect the Gemfile. A gem install will also generate a Gemfile.lock if one is not present, and respect the versions it specifies if it exists.

$ ls
Gemfile
$ gem install -g
Fetching: going_postal-0.1.4.gem (100%)
Installing going_postal (0.1.4)
$ ls
Gemfile
Gemfile.lock

You can see the full list of changes in the RubyGems History File.

Deprecated Rake features removed

The bundled Rake has been updated to version 10.1.0, this removes a bunch of deprecated features. Older versions of Rake have warned about these features for quite a while so hopefully you won’t encounter any compatibility problems.

See the full Rake release notes for versions 10.0.3 and 10.1.0 for more details.

RDoc template update

The included version of RDoc is now at 4.1, which brings a nice update to the default template with some accessibility improvements. See the RDoc History file for the full set of changes.

Process title

A new method Process.setproctitle has been added to set the process title without assigning to $0. A corresponding method Process.argv0 has also been added to retrieve the original value of $0 even if it has been assigned to.

Say you had some code in a background processing worker that looked like the following

data.each_with_index do |datum, i|
  Process.setproctitle("#{Process.argv0} - job #{i} of #{data.length}")
  process(datum)
end

you’d see something like the following if you were to run ps

$ ps
  PID TTY           TIME CMD
  339 ttys000    0:00.23 -bash
 7321 ttys000    0:00.06 background.rb - job 10 of 30

Frozen Symbols

Symbols now join integers and floating point numbers in being frozen.

:foo.frozen?                               #=> true
:foo.instance_variable_set(:@bar, "baz")   #=> #<RuntimeError: can't modify frozen Symbol>

This change was made to set things up for garbage collection of symbols in a future version of Ruby.

Fixed eval scope leak

Using private, protected, public, or module_function without arguments in a string evaluated with eval, instance_eval, or module_eval the method visibility scope would leak out to the calling scope, such that foo in the following example would be private.

class Foo
  eval "private"
  
  def foo
    "foo"
  end
end

This is fixed in 2.1, so foo would be public in this example.

#untrusted? is now an alias of #tainted?

Ruby previously had two sets of methods for marking/checking objects as untrusted, the first set, #tainted?, #taint, and #untaint, and the second #untrusted?, #untrust, and #trust. These behaved the same, but set separate flags, so an object could be untrusted, but not tainted.

These methods have been unified to set/get a single flag, with #tainted? etc being the preferred names and #untrusted? etc generating warnings.

string = "foo"
string.untrust
string.tainted?   #=> true

generates the warning

example.rb:2: warning: untrust is deprecated and its behavior is same as taint

return in lambda now always returns from lambda

Lambdas differ from Procs/blocks in that using return in a lambda returns from the lambda, not the enclosing method. However there was an exception to this, if a lambda was passed to a method with & and called with yield. This exception has now been removed.

def call_with_yield
  yield
end

def test
  call_with_yield(&lambda {return "hello from lambda"})
  "hello from method"
end

test   #=> "hello from method"

The example above would have returned "hello from lambda" under Ruby <= 2.0.0

Get interface addresses

It is now possible to get details of the system’s network interfaces with Socket.getifaddrs. This returns an array of Socket::Ifaddr objects.

require "socket"

info = Socket.getifaddrs.find do |ifaddr|
  (ifaddr.flags & Socket::IFF_BROADCAST).nonzero? &&
    ifaddr.addr.afamily == Socket::AF_INET
end

info.addr.ip_address   #=> "10.0.1.2"

Named capture support in StringScanner

StringScanner#[] now accepts symbols as arguments, and will return the corresponding named capture from the last match.

require "strscan"

def parse_ini(string)
  scanner = StringScanner.new(string)
  current_section = data = {}

  until scanner.eos?
    scanner.skip(/\s+/)
    if scanner.scan(/;/)
      scanner.skip_until(/[\r\n]+/)
    elsif scanner.scan(/\[(?<name>[^\]]+)\]/)
      current_section = current_section[scanner[:name]] = {}
    elsif scanner.scan(/(?<key>[^=]+)=(?<value>.*)/)
      current_section[scanner[:key]] = scanner[:value]
    end
  end

  data
end

YAML.safe_load

YAML (well, Psych, the underlying yaml implementation) has had a safe_load method added. By default only the following classes can be deserialised: TrueClass, FalseClass, NilClass, Numeric, String, Array, and Hash. To deserialise other classes that you know will be safe you can pass a whitelist as an argument.

If a disallowed class is found Psych::DisallowedClass will be raised, this can also be referenced as YAML::DisallowedClass.

require "yaml"
YAML.safe_load(":foo: 1")             #=> #<Psych::DisallowedClass: Tried to load unspecified class: Symbol>
YAML.safe_load(":foo: 1", [Symbol])   #=> {:foo=>1}

Resolv one-shot MDNS and LOC record support

Ruby’s Resolv DNS library gets basic support for one-shot multicast DNS lookups. It doesn’t support continuous queries, and can’t do service discovery, but it’s still a pretty neat new feature (Checkout the dnssd gem for full DNS Service Discovery support).

require "resolv"

resolver = Resolv::MDNS.new
resolver.getaddress("example.local")   #=> #<Resolv::IPv4 10.0.1.2>

Combined with the resolv-replace library this allows you to use mDNS names with most Ruby networking libraries.

require "resolv-replace"
require "net/http"

Resolv::DefaultResolver.replace_resolvers([Resolv::Hosts.new, Resolv::MDNS.new])
Net::HTTP.get_response(URI.parse("http://example.local"))   #=> #<Net::HTTPOK 200 OK readbody=true>

Resolv also gains the ability to query DNS LOC records.

require "resolv"

dns = Resolv::DNS.new

# find.me.uk has LOC records for all UK postcodes
resource = dns.getresource("W1A1AA.find.me.uk", Resolv::DNS::Resource::IN::LOC)

resource.latitude    #=> #<Resolv::LOC::Coord 51 31 6.827 N>
resource.longitude   #=> #<Resolv::LOC::Coord 0 8 37.585 W>

And the final change for Resolve, it’s now possible to get back the full DNS message with Resolv::DNS#fetch_resource.

require "resolv"

dns = Resolv::DNS.new
dns.fetch_resource("example.com", Resolv::DNS::Resource::IN::A) do |reply, reply_name|
  reply        #=> #<Resolv::DNS::Message:0x007f88192e2cc0 @id=55405, @qr=1, @opcode=0, @aa=0, @tc=0, @rd=1, @ra=1, @rcode=0, @question=[[#<Resolv::DNS::Name: example.com.>, Resolv::DNS::Resource::IN::A]], @answer=[[#<Resolv::DNS::Name: example.com.>, 79148, #<Resolv::DNS::Resource::IN::A:0x007f88192e1c80 @address=#<Resolv::IPv4 93.184.216.119>, @ttl=79148>]], @authority=[], @additional=[]>
  reply_name   #=> #<Resolv::DNS::Name: example.com.>
end

Improved Socket error messages

Errors from sockets have been improved to include the socket address in the message.

require "socket"

TCPSocket.new("localhost", 8080)   #=> #<Errno::ECONNREFUSED: Connection refused - connect(2) for "localhost" port 8080>

Hash#shift much faster

The performance of Hash#shift has been massively improved and this, coupled Hash being insertion ordered since Ruby 1.9, makes it practical to implement a simple least recently used cache.

class LRUCache
  def initialize(size)
    @size, @hash = size, {}
  end

  def [](key)
    @hash[key] = @hash.delete(key)
  end

  def []=(key, value)
    @hash.delete(key)
    @hash[key] = value
    @hash.shift if @hash.size > @size
  end
end

Queue, SizedQueue, and ConditionVariable performance improvements

Queue, SizedQueue, and ConditionVariable have been sped up by implementing them in C rather then Ruby.

Timeout internal exception can’t be rescued

It is no longer possible to rescue the exception used internally by Timeout to abort the block it’s given. This is mostly an internal implementation detail that’s nothing to worry about, the Timeout::Error raised externally when the timeout is reached is unchanged and can be rescued as normal.

require "timeout"

begin
  Timeout.timeout(1) do
    begin
      sleep 2
    rescue Exception
      # no longer swallows the timeout exception
    end
  end
rescue StandardError => e
  e   #=> #<Timeout::Error: execution expired>
end

Set

Set gains #intersect? and #disjoint? methods. #intersect? returns true if the receiver and the argument have at least one value in common, and false otherwise. #disjoint? is the opposite and returns true if the sets have no elements in common, false otherwise.

require "set"

a = Set[1,2,3]
b = Set[3,4,5]
c = Set[4,5,6]

a.intersect?(b)   #=> true
b.intersect?(c)   #=> true
a.intersect?(c)   #=> false

a.disjoint?(b)   #=> false
b.disjoint?(c)   #=> false
a.disjoint?(c)   #=> true

Another minor change to Set, #to_set called on a set will simply return self, rather than a copy.

require "set"

set = Set["foo", "bar", "baz"]
set.object_id          #=> 70286489985620
set.to_set.object_id   #=> 70286489985620

Easier streaming responses with WEBrick

The WEBrick HTTP response body can now be set to anything responding to #read and #readpartial. Previously it had to be an instance of IO or a String. The example below implements a class that wraps an enumerator, and then uses this to stream out a response of the current time every second for 10 seconds.

require "webrick"

class EnumeratorIOAdapter
  def initialize(enum)
    @enum, @buffer, @more = enum, "", true
  end

  def read(length=nil, out_buffer="")
    return nil unless @more
    until (length && @buffer.length >= length) || !fill_buffer; end
    if length
      part = @buffer.slice!(0, length)
    else
      part, @buffer = @buffer, ""
    end
    out_buffer.replace(part)
  end

  def readpartial(length, out_buffer="")
    raise EOFError if @buffer.empty? && !fill_buffer
    out_buffer.replace(@buffer.slice!(0, length))
  end

  private
  def fill_buffer
    @buffer << @enum.next
  rescue StopIteration
    @more = false
  end
end

server = WEBrick::HTTPServer.new(Port: 8080)

server.mount_proc "/" do |request, response|
  enum = Enumerator.new do |yielder|
    10.times do
      sleep 1
      yielder << "#{Time.now}\r\n"
    end
  end

  response.chunked = true
  response.body = EnumeratorIOAdapter.new(enum)
end

trap(:INT) {server.shutdown}
server.start

Numeric#step

The #step method on Numeric can now accept the keyword arguments by: and to: rather than positional arguments. The to: argument is optional, and if omitted it will result in an infinite sequence. If using positional arguments you can pass nil as the first argument to get the same behaviour.

0.step(by: 5, to: 20) do |i|
  puts i
end

outputs:

0
5
10
15
20
0.step(by: 3) do |i|
  puts i
end
0.step(nil, 3) do |i|
  puts i
end

would both output

0
3
6
9
12
... and so on

IO

The IO#seek method now accepts :CUR, :END, and :SET as symbols, along with the old flags named by the constants IO::SEEK_CUR, IO::SEEK_END, and IO::SEEK_SET.

New are IO::SEEK_DATA and IO::SEEK_HOLE (or :DATA and :HOLE) for its second argument. When these are supplied then the first argument is used as the minimum size of the data/hole to seek too.

f = File.new("example.txt")

# sets the offset to the start of the next data chunk at least 8 bytes long
f.seek(8, IO::SEEK_DATA)

# sets the offset to the start of the next empty space at least 32 bytes long
f.seek(32, IO::SEEK_HOLE)

These may not be supported on all platforms, you can check with IO.const_defined?(:SEEK_DATA) and IO.const_defined?(:SEEK_HOLE)

Use IO _nonblock without raising exceptions

IO#read_nonblock and IO#write_nonblock now each get an exception keyword argument. When set to false (default is true) this causes the methods to return a symbol on error, rather than raise exceptions.

require "socket"

io = TCPSocket.new("www.example.com", 80)

message = "GET / HTTP/1.1\r\nHost: www.example.com\r\nConnection: close\r\n\r\n"
loop do
  IO.select(nil, [io])
  result = io.write_nonblock(message, exception: false)
  break unless result == :wait_writeable
end

response = ""
loop do
  IO.select([io])
  result = io.read_nonblock(32, exception: false)
  break unless result
  next if result == :wait_readable
  response << result
end

puts response.lines.first

IO ignores internal encoding if external encoding is ASCII-8BIT

If you set default internal and external encodings Ruby will transcode from the external encoding to the internal. The exception to this is when the external encoding is set to ASCII-8BIT (aka binary), where no transcoding takes place.

The same exception should be made if the encodings were supplied to an IO method as an argument, but there was a bug, and the transcoding would take place. This has now been fixed.

File.read("example.txt", encoding: "ascii-8bit:utf-8").encoding   #=> #<Encoding:ASCII-8BIT>

#include and #prepend now public

Affecting Module and Class, the #include and #prepend methods are now public.

module NumberQuery
  def number?
    match(/\A(0|-?[1-9][0-9]*)\z/) ? true : false
  end
end

String.include(NumberQuery)

"123".number?   #=> true
require "bigdecimal"

module FloatingPointFormat
  def to_s(format="F")
    super
  end
end

BigDecimal.prepend(FloatingPointFormat)

decimal = BigDecimal("1.23")
decimal.to_s   #=> "1.23" # rather than "0.123E1"

Module/Class #singleton_class?

Module and Class gain a #singleton_class? method that, predictably, returns whether or not the receiver is a singleton class.

class Example
  singleton_class?     #=> false
  class << self
    singleton_class?   #=> true
  end
end

Module#ancestors more consistent

#ancestors called on a singleton class now includes singleton classes in the returned array, this makes the behaviour more consistent between being called on regular classes and singleton classes. It also clears up an irregularity where singleton classes would show up, but only if a module had been prepended (not included) in to the singleton class.

Object.ancestors.include?(Object)                                   #=> true
Object.singleton_class.ancestors.include?(Object.singleton_class)   #=> true

Object#singleton_method

Similar to #method and #instance_method, but will return only singleton methods.

class Example
  def self.test
  end

  def test2
  end
end

# returns class method
Example.singleton_method(:test)    #=> #<Method: Example.test>
# doesn't return instance method
Example.singleton_method(:test2)   #=> #<NameError: undefined singleton method `test2' for `Example'>
# doesn't return inherited class method
Example.singleton_method(:name)    #=> #<NameError: undefined singleton method `name' for `Example'>
example = Object.new

def example.test
end

example.singleton_method(:test)   #=> #<Method: #<Object:0x007fc54997a610>.test>

Method#original_name

Method and UnboundMethod gain an #original_name method to return the un-aliased name.

class Example
  def foo
    "foo"
  end
  alias bar foo
end

example = Example.new
example.method(:foo).original_name            #=> :foo
example.method(:bar).original_name            #=> :foo
Example.instance_method(:bar).original_name   #=> :foo

Mutex#owned?

Mutex#owned? is no longer experimental, and there’s not much more to say about that.

Hash#reject

Calling Hash#reject on a subclass of Hash will issue a warning. In Ruby 2.2 #reject called on a subclass of Hash will returns a new Hash instance, rather than an instance of the subclass. So in preparation for that potentially breaking change there is a warning.

class MyHash < Hash
end

example = MyHash.new
example[:a] = 1
example[:b] = 2

example.reject {|k,v| v > 1}.class   #=> MyHash

Generates the following warning.

example.rb:8: warning: copying unguaranteed attributes: {:a=>1, :b=>2}
example.rb:8: warning: following atributes will not be copied in the future version:
example.rb:8: warning:   subclass: MyHash

Ruby 2.1.1 accidentally included the full change, returning Hash in the example above and not generating a warning. This was reverted in 2.1.2.

Vector#cross_product

The Vector class gains a cross_product instance method.

require "matrix"

Vector[1, 0, 0].cross_product(Vector[0, 1, 0])   #=> Vector[0, 0, -1]

Fixnum/Bignum #bit_length

Calling #bit_length on an integer will return the number of digits it takes to represent that number in binary.

128.bit_length                   #=> 8
32768.bit_length                 #=> 16
2147483648.bit_length            #=> 32
4611686018427387904.bit_length   #=> 63

pack/unpack Native Endian long long

Array#pack and String#unpack gain the ability to work with native endian long longs with the Q_/Q! and q_/q! directives.

# output may differ depending on the endianness of your system
unsigned_long_long_max = [2**64 - 1].pack("Q!")   #=> "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"
signed_long_long_min = [-2**63].pack("q!")        #=> "\x00\x00\x00\x00\x00\x00\x00\x80"
signed_long_long_max = [2**63 - 1].pack("q!")     #=> "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x7F"

unsigned_long_long_max.unpack("Q!")   #=> 18446744073709551615
signed_long_long_min.unpack("q!")     #=> -9223372036854775808
signed_long_long_max.unpack("q!")     #=> 9223372036854775807

Dir glob returns composed characters

The HFS Plus filesystem on Mac OS X uses the UTF8-MAC encoding for filenames, with decomposed characters, e.g. é is represented with e and U+0301, rather than just U+00E9 (with some exceptions). Dir.glob and Dir[] now normalise this back to UTF8 encoded strings with composed characters.

File.write("composed_e\u{301}xample.txt", "")
File.write("precomposed_\u{e9}xample.txt", "")

puts Dir["*"].map(&:dump)
"composed_\u{e9}xample.txt"
"example.rb"
"precomposed_\u{e9}xample.txt"

Better type coercion for Numeric#quo

Numeric#quo now calls #to_r on the receiver which should allow for better behaviour when implementing your own Numeric subclasses. It also means TypeError rather than ArgumentError will be raised if the receiver can’t be converted. As TypeError is a subclass of ArgumentError this shouldn’t be an issue.

Binding#local_variable_get/_set/_defined?

Binding gets methods to get/set local variables. This can come in handy if you really want to use a keyword argument that clashes with a reserved word

def primes(begin: 2, end: 1000)
  [binding.local_variable_get(:begin), 2].max.upto(binding.local_variable_get(:end)).each_with_object([]) do |i, array|
    array << i unless (2...i).any? {|j| (i % j).zero?}
  end
end

primes(end: 10)   #=> [2, 3, 5, 7]

Or if you want to use a Hash to populate local variables in a Binding, say for evaluating a template

def make_binding(hash)
  b = TOPLEVEL_BINDING.dup
  hash.each {|k,v| b.local_variable_set(k, v)}
  b
end

require "erb"

cover = %Q{<h1><%= title %></h1>\n<h2 class="big friendly"><%= subtitle %></h2>}
locals = {:title => "Hitchhiker's Guide to the Galaxy", :subtitle => "Don't Panic"}

ERB.new(cover).result(make_binding(locals))   #=> "<h1>Hitchhiker's Guide to the Galaxy</h1>\n<h2 class=\"big friendly\">Don't Panic</h2>"

CGI class methods now available from CGI::Util module

CGI has a few handy utility class methods for escaping url and html strings. These have been moved to the CGI::Util module which can be included into other classes or the main scope for scripts.

require "cgi/util"

CGI.escape("hello world!")   #=> "hello+world%21"

include CGI::Util

escape("hello world!")       #=> "hello+world%21"

Digest::Class.file passes arguments to initialiser

The various Digest classes have a shortcut method for producing the digest for a given file, this method has been updated to pass any extra arguments past the filename to the implementation’s initialiser. So rather than:

require "digest"
Digest::SHA2.new(512).hexdigest(File.read("example.txt"))   #=> "f7fbba..."

It’s possible to do:

require "digest"
Digest::SHA2.file("example.txt", 512).hexdigest             #=> "f7fbba..."

Net::SMTP#rset

It is now possible to abort an SMTP transaction by sending the RSET command with Net::SMTP#rset.

require "net/smtp"

smtp = Net::SMTP.start("some.smtp.server")
notification = "Hi %s,\n ..."

users.each do |user|
  begin
    smtp.mailfrom("[email protected]")
    smtp.rcptto(user.email)
    smtp.data(sprintf(notification, user.name))
  rescue
    smtp.rset
  end
end

smtp.finish

open-uri supports repeated headers

open-uri allows Kernel#open to open resources with a URI, and will extend the return value with OpenURI::Meta. This gains a new #metas method to return the header values as arrays, for the case when a header has been used multiple times, eg set-cookie.

require "open-uri"

f = open("http://google.com")
f.meta["set-cookie"].class     #=> String
f.metas["set-cookie"].class    #=> Array
f.metas["set-cookie"].length   #=> 2

Write to files with Pathname

Pathname gains #write and #binwrite methods to write to files.

require "pathname"

path = Pathname.new("test.txt").expand_path(__dir__)
path.write("foo")
path.write("bar", 3) # offset
path.write("baz", mode: "a") # append

Tempfile.create

Tempfile now has a create method similar to new but rather than returning a Tempfile instance that uses a finaliser to clean up the file when the object is garbage collected, it yields a plain File object to a block and cleans up the file at the end of the block.

require "tempfile"

path = nil
Tempfile.create("example") do |f|
  f                 #=> #<File:/tmp/example20140428-16851-15kf046>
  path = f.path
end
File.exist?(path)   #=> false

Rinda multicast support

The Rinda Ring classes are now able to listen on/connect to multicast addresses.

Here’s an example of using Rinda to create an extremely simple service registry listening on the multicast address 239.0.0.1

require "rinda/ring"
require "rinda/tuplespace"

DRb.start_service

tuple_space = Rinda::TupleSpace.new
server = Rinda::RingServer.new(tuple_space, ["239.0.0.1"])

DRb.thread.join

To have a service register itself:

require "rinda/ring"

DRb.start_service
ring_finger = Rinda::RingFinger.new(["239.0.0.1"])
tuple_space = ring_finger.lookup_ring_any

tuple_space.write([:message_service, "localhost", 8080])

# start messaging service on localhost:8080

And discover the address of a service:

require "rinda/ring"

DRb.start_service
ring_finger = Rinda::RingFinger.new(["239.0.0.1"])
tuple_space = ring_finger.lookup_ring_any

_, host, port = tuple_space.read([:message_service, String, Fixnum])

# connect to messaging service

I had some issues with the tuple_space = ring_finger.lookup_ring_any line causing a segfault, and had to use the following in it’s place:

tuple_space = nil
ring_finger.lookup_ring(0.01) {|x| break tuple_space = x}

Easy setting of extra HTTP options for XMLRPC

XMLRPC::Client#http returns the Net::HTTP instance being used by the client to allow minor configuration options that don’t have an accessor on the client to be set.

client = XMLRPC::Client.new("example.com")
client.http.keep_alive_timeout = 30 # keep connection open for longer
# use client ...

URI.encode_/decode_www_form updated to match WHATWG standard

URI.encode_www_form and URI.decode_www_form have been updated to match the WHATWG standard.

URI.decode_www_form no longer treats ; as a separator, & is the only default separator, but there is a new separator: keyword argument if you need to change it.

require "uri"
URI.decode_www_form("foo=1;bar=2", separator: ";")   #=> [["foo", "1"], ["bar", "2"]]

URI.decode_www_form can also now successfully decode the output of URI.encode_www_form when a value is nil.

require "uri"

string = URI.encode_www_form(foo: 1, bar: nil, baz: 3)   #=> "foo=1&bar&baz=3"
URI.decode_www_form("foo=1&bar&baz=3")                   #=> [["foo", "1"], ["bar", ""], ["baz", "3"]]

RbConfig::SIZEOF

RbConfig::SIZEOF has been added to provide the size of C types.

require "rbconfig/sizeof"

RbConfig::SIZEOF["short"]   #=> 2
RbConfig::SIZEOF["int"]     #=> 4
RbConfig::SIZEOF["long"]    #=> 8

Can set facility with Syslog::Logger

Syslog::Logger, the Logger-compatible interface to Syslog, gets the ability to set the facility.

require "syslog/logger"

facility = Syslog::LOG_LOCAL0
logger = Syslog::Logger.new("MyApp", facility)

logger.debug("test")

CSV.foreach with no block returns working enumerator

CSV.foreach called without a block argument returns an enumerator, however this has for a long time resulted in an IOError when it was actually used. This has now been fixed.

require "csv"

enum = CSV.foreach("example.csv")

enum.next   #=> ["1", "foo"]
enum.next   #=> ["2", "bar"]
enum.next   #=> ["3", "baz"]

OpenSSL bignum

OpenSSL::BN.new now accepts integers as well as strings.

require "openssl"

OpenSSL::BN.new(4_611_686_018_427_387_904)   #=> #<OpenSSL::BN:0x007fce7a0c56e8>

Enumerator.new size argument fixed to accept any callable object

Enumerator.new takes a size argument which can either be an integer, or an object responding to #call. Under 2.0.0 only integers and Procs would work, despite what the documentation said. This has now been fixed.

require "thread"

queue = Queue.new
enum = Enumerator.new(queue.method(:size)) do |yielder|
  loop {yielder << queue.pop}
end
queue << "foo"
enum.size   #=> 1

curses library removed

curses has been removed from the standard library and is now available as a gem.

TSort class methods

TSort can be useful for determining an order to complete tasks from a list of dependancies. However it’s a bit of a hassle to use, having to implement a class, include TSort, and implement #tsort_each_node and #tsort_each_child.

But now TSort is a little easier to use with, say, a hash. The same methods that are available as instance methods are now available on the module itself, taking two callable objects, one to take the place of #tsort_each_node and the second #tsort_each_child.

require "tsort"

camping_steps = {
  "sleep" => ["food", "tent"],
  "tent" => ["camping site", "canvas"],
  "canvas" => ["tent poles"],
  "tent poles" => ["camping site"],
  "food" => ["fish", "fire"],
  "fire" => ["firewood", "matches", "camping site"],
  "fish" => ["stream", "fishing rod"]
}

all_nodes = camping_steps.to_a.flatten
each_node = all_nodes.method(:each)
each_child = -> step, &b {camping_steps.fetch(step, []).each(&b)}
puts TSort.tsort(each_node, each_child)

Outputs:

stream
fishing rod
fish
firewood
matches
camping site
fire
food
tent poles
canvas
tent
sleep

TCP Fast Open

Ruby 2.1 has added support for TCP Fast Open if it is available on your system. It’s possible to check whether it is available by checking the for existence of the Socket::TCP_FASTOPEN and Socket::MSG_FASTOPEN constants.

Server:

require "socket"

unless Socket.const_defined?(:TCP_FASTOPEN)
  abort "TCP Fast Open not supported on this system"
end

server = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM)
server.setsockopt(Socket::SOL_TCP, Socket::TCP_FASTOPEN, 5)
addrinfo = Addrinfo.new(Socket.sockaddr_in(3000, "localhost"))
server.bind(addrinfo)
server.listen(1)

socket = server.accept
socket.write(socket.readline)

Client:

require "socket"

unless Socket.const_defined?(:MSG_FASTOPEN)
  abort "TCP Fast Open not supported on this system"
end

socket = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM)
socket.send("foo\n", Socket::MSG_FASTOPEN, Socket.sockaddr_in(3000, "localhost"))
puts socket.readline
socket.close

And that’s it…

Please let me know if there’s anything missing or incorrect here.