sourcetagsandcodes.com

Ruby Tips Part 5

Posted 22 April 2014 by Mat Sadler

This article was originally posted on the globaldev blog, they have kindly allowed me to repost it here. If you’re looking for a Ruby job in London you should check out their jobs page.

This is part 5 of a 5-part series on Ruby tips and tricks gleaned from the Global Personals team’s pull requests over the last two years. Part 1 covers blocks and ranges, part 2 deals with destructuring and type conversions, part 3 talks about exceptions and modules, and part 4 gives an overview of debugging, project layout, and documentation.

This post is a collection of a whole bunch of unrelated tips that didn’t make it in to the earlier parts.

Alternative String Syntax

Along with single and double quotes, Ruby has another method of delimiting strings. %q{example} works just like single quotes, and %Q{example} allows interpolation just like double quotes. While I like to use braces, almost anything can be used for the delimiters, e.g. %q|example|. {}, [], (), and <> are treated specially, in that they can appear in the string without being escaped, as long as they are a matched pair. This can come in handy when you want to use interpolation and double quotes in a string:

logger.info(%Q{User "#{user.name}" logged in})

This style of quoting isn’t limited to strings, %r{example} produces a regexp just like /example/. This is especially useful when matching file paths

if path =~ %r{^/assets/mobile/img}
  ...
end

There’s also %s{example} for symbols, %w{example1 example2} for an array of whitespace delimited strings, and %x{which ruby} that works like backticks to execute an external command, e.g. `which ruby`. As of Ruby 2.0 there’s %i{example1 example2} that generates an array of whitespace delimited symbols.

Finding Files

When you’re opening another file that’s part of your project, a template or some config, it’s often best to use the path relative to the current file. You may see this done like

config_path = File.expand_path("config.yml", File.dirname(__FILE__))

You can shorten this a little by taking advantage of the way File.expand_path works, traversing up the file path with ..

config_path = File.expand_path("../config.yml", __FILE__)

If you’re using Ruby 2.0 it’s even simpler

config_path = File.expand_path("config.yml", __dir__)

When it comes to parsing a config file it’s fairly common to see

config = YAML.load(File.open(config_path))

The YAML library actually has a built in method that does both these operations in one

config = YAML.load_file(config_path)

Avoid eval with const_get

The number one culprit for unnecessary uses of eval seems to be resolving a string as a constant, fortunately this is fairly easy to do right.

string = "Enumerator::Lazy"
klass = string.split("::").inject(Object) {|base, str| base.const_get(str)}

Ruby 2.0 makes this even easier, as Object.const_get understands namespaces.

klass = Object.const_get(string)

Hash Default Proc

Ruby’s Hash has the concept of a ‘default proc’. This is a Proc object that’s called when a key can’t be found, given the hash itself and the key as arguments, so you can set/return a default value for that key. It can be set with the #default_proc= method, or by supplying a block to Hash.new

One case this is particularly useful is grouping objects together

users_by_city = Hash.new {|hash, key| hash[key] = []}
users.each {|user| users_by_city[user.city] << user}

Abbreviated Assignment

Ruby has a number of abbreviated assignment operators, these combine assignment with another operator. The easiest to understand and most immediately obvious of these is +=, as in i += 1 increments the variable i by 1. Most of Ruby’s operators can be combined with assignment like this, and can be understood by expanding them out, e.g.

i += 1 # equivalent to:
i = i + 1

i /= 2 # equivalent to:
i = i / 2

A particularly useful instance of this is ||= (“or equals”), this will only perform the assignment if the left hand side of the expression is ‘falsy’ (nil or false). If the left hand side is ‘truthy’ then that will be the result of the expression. This makes it really easy to do memoisation.

Say you have a method that fetches some data via an API, and you don’t want to re-fetch the data every time you call the method, it’s now a really easy problem to solve

def posts
  @posts ||= fetch_resource(:posts, user: id)
end

You can even make this work when the method takes arguments

def user(id)
  (@users ||= {})[id] ||= fetch_resource(user: id)
end

One thing to bear in mind is ||= slightly breaks the expansion rule (along with &&=), and expands out like:

x ||= y # equivalent to:
x || x = y

Enumerable

Ruby’s Array has a lot of helpful methods, and even more through Enumerable.

Whether it’s asserting the contents of an array with #all? or #any?, filtering it with #select or #reject, finding an element with #detect, or picking a random element with #sample the chances are anything you want to do with an array, there’s already a method for it. So it’s always a good idea to check the Array and Enumerable documentation if you ever find yourself doing a lot of work getting some information out of an array.

One particular method that’s worth highlighting is #each_with_object. This is a specialisation of a particular use case for #inject.

Normally you’d use #inject like this, supplying it with an initial value for the accumulator, then returning a new value each iteration that becomes the accumulator for the next iteration, until you get the final result.

sum = [1,2,3].inject(0) {|accumulator, element| accumulator + element}

People started using this to build up objects, here’s a common example returning a hash.

ATTRIBUTES = [:id, :first_name, :last_name]
def to_hash
  ATTRIBUTES.inject({}) {|acc, attr| acc[attr] = send(attr); acc}
end

However this isn’t great, you need to remember to return the object you’re building from each iteration, plus it ends up needlessly assigning the same object to the accumulator each iteration, which comes with a performance penalty. For this reason #each_with_object was added in Ruby 1.9 for this particular use case.

ATTRIBUTES = [:id, :first_name, :last_name]
def to_hash
  ATTRIBUTES.each_with_object({}) {|attr, hash| hash[attr] = send(attr)}
end

Data Structures

Everyone knows about Ruby’s Array and Hash, but there are a few other data structures that you should be familiar with.

Set, included in the stdlib, is an un-ordered collection of elements with no duplicates. However you’ll find that Set’s most useful feature is that its #include? method is much faster than Array’s. In fact Set#include? will always take the same amount of time, regardless of how many elements it contains, whereas Array#include? takes longer the more elements in the array.

require "set"

TEMP_EMAIL_PROVIDERS = Set["temp-email.tld", "throwaway-mail.tld", ...]

def temporary_email?(address)
  TEMP_EMAIL_PROVIDERS.include?(address.split("@").last.downcase)
end

Sets do take longer to build than Arrays, so work best when they can be created just once and assigned to a constant, or instance variable of a long lived object.

Another useful data structure is Queue. This is a synchronised, i.e. thread safe, first-in first-out queue. This makes it easy to coordinate work between threads. Here’s a simple example of a program that lets you enter multiple URLs to download, then works through them one by one in the background.

require "thread" # Queue is part of the thread library
require "net/http"

to_do = Queue.new

Thread.abort_on_exception = true # if one thread dies, everyone dies

interface = Thread.new do
  loop do
    url = STDIN.gets # waits here until a URL has been entered
    if url && url != "quit\n"
      to_do.push(URI.parse(url))
    else
      to_do.push(false)
      break
    end
  end
end

fetcher = Thread.new do
  loop do
    uri = to_do.pop # waits until there's a URI in the queue
    break unless uri
    result = Net::HTTP.get(uri)
    File.open("#{uri.host}#{uri.path.tr("/", ":")}", "w+") do |f|
      f << result
    end
    puts "downloaded #{uri}"
  end
end

[interface, fetcher].each(&:join) # don't exit until the threads are done

Thanks to Queue this program could even be extended to download multiple files at once simply by creating multiple fetcher threads.

Struct is like a lightweight class, allowing you to collect together a group of attributes similarly to a hash. Where struct can be more useful is that the attributes become methods, and only those defined can be accessed, typos or additions will result in an error. Struct also gives a name to the data structure holding the values, that can make things easier to understand, rather than just another nameless hash.

Point = Struct.new(:x, :y)

def euclidean_distance(a, b)
  Math.sqrt((a.x - b.x)**2 + (a.y - b.y)**2)
end

a = Point.new(1, 5)        #=> #<struct Point x=1, y=5>
b = Point.new(4, 2)        #=> #<struct Point x=4, y=2>

euclidean_distance(a, b)   #=> 4.242640687119285

Structs also have #[] and #[]= methods, so can be accessed like hashes, making it easy to gradually switch your code over to using them.

While Set, Queue, and Struct are part of Ruby there’s one more really useful data structure: RBTree. This is available as the rbtree gem. RBTree is – unsurprisingly – a tree data structure, with much the same interface as Hash. However it stores its keys in sorted order, and allows you to take advantage of this with methods to access by range, or customise the sort order used.

require "rbtree"

word_counts = RBTree.new {|tree, key| tree[key] = 0}

path = File.expand_path("the_hound_of_the_baskervilles.txt")
File.open(path) do |file|
  file.each_line do |line|
    line.split(/\W/).each do |word|
      word_counts[word] += 1
    end
  end
end

word_counts.bound("x", "zz").each do |word, count|
  puts "#{word}: #{count}"
end
require "rbtree"

headers = RBTree[
  "Content-Type", "text/html",
  "Content-Length", "1270"]
# sort with case insensitive string comparison
headers.readjust {|a, b| a.casecmp(b)}

headers["content-type"]   #=> "text/html"
headers["CONTENT-TYPE"]   #=> "text/html"
# original case preserved
headers.keys              #=> ["Content-Length", "Content-Type"]

Another advantage of RBTree is that it can be much faster to construct than a Hash, particularly for very large collections. However whereas Hash has constant access time, O(1), RBTree is a little slower with logarithmic, O(log n), access time.

String#to_i and Fixnum#to_s Radix Arguments

String#to_i and Fixnum#to_s both take a radix (aka base) argument for simple conversion between bases.

# binary
2.to_s(2)                                        #=> "10"
"11".to_i(2)                                     #=> 3

# hex
16.to_s(16)                                      #=> "10"
"FF".to_i(16)                                    #=> 255

# use base 36 to generate a simple random alphanumeric string
rand(36**8).to_s(36)                             #=> "dtkbi9py"

# convert hex colour to rgb
"0645AD".scan(/../).map {|hex| hex.to_i(16)}   #=> [6, 69, 173]

Name Numbers with Constants

Use constants to give a name to numbers you are using in your code. It might seem perfectly obvious that that 86400 is a day’s worth of seconds when you write the code, but setting and using DAY = 86400 is going to make your code much easier to read when you come back to it.

The example below implements the algorithm from java.lang.String.hashCode() for interoperability with a Java service. Using constants makes the code much easier to understand, and gives the opportunity to add a comment without cluttering the code. Also note how the #overflow_int method is split out from #string_hash_code to further explain exactly what’s happening.

module Java
  SIGNED_INT_MIN = -2147483648
  UNSIGNED_INT_MAX = 4294967295
  PRIME = 31 # n * 31 optimises well in java and gives a good hash distribution

  module_function

  def string_hash_code(string)
    string.bytes.inject(0) {|hash, byte| overflow_int(PRIME * hash + byte)}
  end

  def overflow_int(i)
    ((i - SIGNED_INT_MIN) & UNSIGNED_INT_MAX) + SIGNED_INT_MIN
  end
end

Java.string_hash_code("foobarbaz")   #=> -1165917330

Building Strings

Ruby has a number of ways to build up strings and you should make sure you choose the right one.

User = Struct.new(:id, :name)
user = User.new(42, "Arthur")

user.name << ": " << user.id.to_s   #=> "Arthur: 42"

This first example uses the String#<< append method, but it turns out to have been a bad choice, which you can see if you ask for the user’s name again

user.name   #=> "Arthur: 42"

This is because String#<< alters the string on the left, appending the string on the right, rather than creating a new string from the two.

We can fix this by switching to String#+

user.name + ": " + user.id.to_s   #=> "Arthur: 42"
user.name                         #=> "Arthur"

However this can be a bit wasteful, there are 5 strings involved in this ("Arthur", ": ", "Arthur: ", "42", and "Arthur: 42"). Also we have to remember to call #to_s on any non-String objects, and it’s surprisingly easy to forget to add the space in ": " when formatting strings like this.

Fortunately Ruby has a solution to this, the string interpolation syntax:

"#{user.name}: #{user.id}"   #=> "Arthur: 42"

No unnecessary strings are created, #to_s is called automatically, and the formatting is much more obvious.

When you’re formatting strings in Ruby the string interpolation syntax is almost always the one you want to go with.

That’s not to say other methods of building strings are useless.

class HTMLBuilder
  def initialize
    @string = "<html>"
  end

  def method_missing(name, *args)
    @string << "<#{name}>"
    @string << args.join("")
    yield if block_given?
    @string << "</#{name}>"
    self
  end

  def close
    @string << "</html>"
    self
  end

  def to_s
    @string
  end
end

b = HTMLBuilder.new

b.head do
  b.title("test")
end

b.body do
  b.p("Hello world!")
end

b.close

b.to_s   #=> "<html><head><title>test</title></head><body><p>Hello world!</p></body></html>"

The example above is a good use of String#<<, just one main string is created, which gradually grows larger, the temporary strings appended to it are kept small and are easily disposed of by the garbage collector. Had something like @string += "<#{name}>" been used then a new copy of @string would have been created and the old, potentially very large, copy would have to be disposed of. The same goes for @string = "#{@string}<#{name}>".

The example below shows when you might want to use String#+. It’s useful when you want to join to the end of a string, and are using strings that come from outside of your method, so you want to make sure you create a new string rather than modify the one you’re given.

TEMPLATE_DIR = File.dirname(__FILE__) + "/templates/"

def template(name)
  name = name =~ /\.erb$/ ? name : name + ".erb"
  File.read(TEMPLATE_DIR + name)
end

Class Instance Variables

Ruby’s class variables aren’t its best feature, the way they are shared between all subclasses can lead to confusion, and the fact that they must be initialised before use is an annoying disconnect from instance variables. Class variables are also wide open to all instances, making it hard to keep encapsulation. The solution to this is to instead use class instance variables.

In the example below the scenario is as if you’re writing a handful of supporting objects for a small short-running script or background job. Data is being fetched from a couple of endpoints, you want to cache data so you don’t have to fetch it multiple times, and you want to reuse the same connection for each endpoint.

class APIObject
  def initialize(attributes)
    attributes.each do |name, value|
      send("#{name}=", value) if respond_to?("#{name}=")
    end
  end

  def self.host(host)
    @client = nil
    @host = host
  end

  def self.port(port)
    @client = nil
    @port = port
  end

  host "api.example.com"

  def self.client
    return @client if @client
    return Net::HTTP.new(@host, @port || 80) if @host
    superclass.client
  end

  def self.cache
    @cache ||= {}
  end

  def self.find(id)
    cache[id] ||= new(JSON.parse(client.get("#{path}/#{id}")))
  end
end

class User < APIObject
  attr_reader :id, :first_name, :last_name
end

class Message < APIObject
  attr_reader :id, :sender_id, :recipient_id, :text, :attachment_id

  def sender
    User.find(sender_id)
  end

  def recipient
    User.find(recipient_id)
  end

  def attachment
    Attachment.find(attachment_id) if attachment_id
  end
end

class Attachment < APIObject
  host "data.example.com"
  attr_reader :id, :mime_type, :data
end

@host, @port, @client, and @cache are all class instance variables, so while they are defined in APIObject each subclass is a separate instance of Class, so will have its own copy.

In the case of @cache this is exactly what you want, you don’t want Message id 3 to overwrite User id 3 in a shared cache.

As for the HTTP client you want User and Message to share the client from APIObject, but Attachment wants it’s own. We could do the first part with a regular class variable, but then attempting to set a different client on Attachment would overwrite the one used by User and Message. The solution again is to use a class instance variable, this time with some clever logic in the client method to look up the inheritance chain if @client isn’t available or can’t be set.

Namespace Global Variables with Module Attributes

Global variables are generally a bad idea. Having some data that can be read or written anywhere in your code can lead to code that is hard to test and prone to unexpected behaviour. Also contributing to this problem is a global variable has no clear owner; why shouldn’t I set $verbose = true, it’s just as much mine as it is yours.

If you really do need read and write access to a bit of data across your code a better approach is a module (or class) attribute:

module MyApp
  class << self
    attr_accessor :verbose
  end
end

MyApp.verbose = true

Now you’re not going to clash with some other library using the same global variable, you know who it belongs to so you’re less likely to use it where you shouldn’t, and it’s easier to stub or mock in tests.

Naming Predicate Methods

By convention, predicate methods (those that return true or false) in Ruby are named with a trailing ? e.g. empty? or any?. Other languages where the ? is not available in method names will use conventions like is_empty or has_any. You can save yourself some typing, and write code that better fits with other Ruby code by skipping the is_ or has_ prefix on predicate methods. The ? is enough to signal the method’s intention.

Hash#has_key? is an exception to this, but this is deprecated by Hash#key?

Default Arguments

Default arguments are evaluated in the same scope as the method, so you can use local variables or even call methods. You can use earlier arguments in defaults for later arguments!

class User
  attr_accessor :name, :email, :phone, :default_notify_method

  def initialize(name, email, phone)
    @name, @email, @phone = name, email, phone
    @default_notify_method = :email
  end

  def notify(message, via=@default_notify_method, address=address_for_notify(via))
    Notifier.for(via).deliver("Dear #{@name},\n#{message}", address)
  end

  private

  def address_for_notify(method)
    {email: @email, sms: @phone}[method]
  end
end

Inner Classes

Just like you can nest classes inside modules for namespacing, you can do the same with classes. A class nested within another is known as an ‘inner class’. Inner classes are useful for small utility classes that aren’t going to be used outside of the parent class, or structs/simple classes for structuring data returned by the parent class. Examples of both are shown below.

require "strscan"

class Logfile
  class LogScanner < StringScanner
    def scan_delimited(begin_char=nil, end_char)
      skip(begin_char) if begin_char
      scan_until(end_char).sub(/#{end_char}$/, "")
    end
  end

  Line = Struct.new(:ip, :identity, :user, :date, :request, :status, :bytes)

  attr_reader :lines

  def initialize(file_path)
    @file_path = file_path
  end

  def each
    return to_enum(__method__) unless block_given?
    scanner = LogScanner.new("")
    File.open(@file_path) do |f|
      f.each_line do |line|
        scanner.string = line
        yield Line.new(
          scanner.scan_delimited(/ /),
          scanner.scan_delimited(/ /),
          scanner.scan_delimited(/ /),
          scanner.scan_delimited(/\[/, /\]/),
          scanner.scan_delimited(/ "/, /"/),
          scanner.scan_delimited(/ /, / /),
          scanner.scan_delimited(/$/))
      end
    end
  end

  # methods to report on log file ...
end

warn

The warn method is very much like puts, except it outputs to stderr rather than stdout. This can be useful to make sure that warning or debugging messages don’t get mixed in with program output or structured logs.

def is_logged_in?
  warn "deprecated method `is_logged_in?' at #{caller.first}, use `logged_in?'"
  logged_in?
end

This is especially useful in scripts and command line programs where there’s a good chance stdout will be piped to another program, and you don’t want to interfere with that.

Exit Status

Another important thing in writing a script or command line program is setting the exit status correctly. The exit status 0 is used to indicate success, and a positive number up to 255 indicates failure.

A Ruby program that ends normally will exit with 0. A program that ends due to an exception will exit with 1 and also print the exception message to stderr.

There are a number of methods that can be used to set the exit status.

exit            # exits immediately with status 0
exit(1)         # exits immediately with status 1
abort "message" # exits immediately with status 1, prints message to stderr
raise           # raises exception and exits with status 1 if not rescued
fail            # alias to raise

Hash[]

The Hash[] method turns an array of pairs in to a hash.

pairs = [[:id, 42], [:name, "Arthur"]]
Hash[pairs]   #=> {:id=>42, :name=>"Arthur"}

This comes in useful with turning the results of Hash#map back in to a hash.

def string_keys(hash)
  Hash[hash.map {|k,v| [k.to_s,v]}]
end

And as of Ruby 2.1 there’s the even simpler Enumerable#to_h.

def string_keys(hash)
  hash.map {|k,v| [k.to_s,v]}.to_h
end

1_000_000

Ruby allows _ in numeric constants, you can use this to make any numbers in your code easier to read at a glance.

1_000_000
1.000_1
0xff_ff_ff
0b11000011_10101001

Forwardable

The Forwardable module from the standard library can make your life easier when you have an object that is a wrapper around another, or needs to delegate a few methods to an object it owns.

require "forwardable"

class Team
  extend Forwardable

  def_delegator :@members, :push, :add_member
  def_delegator :@members, :delete, :remove
  def_delegators :@members, :size, :each, :member?

  def initialize(name)
    @name = name
    @members = []
  end
end

You can also use it to turn Law of Demeter violations like user.address.country into user.country.

require "forwardable"

class User
  extend Forwardable
  def_delegator :address, :country
  attr_accessor :address

  # ...

end