This article was originally posted on the globaldev blog, they have kindly allowed me to repost it here. If you’re looking for a Ruby job in London you should check out their jobs page.
This is part 5 of a 5-part series on Ruby tips and tricks gleaned from the Global Personals team’s pull requests over the last two years. Part 1 covers blocks and ranges, part 2 deals with destructuring and type conversions, part 3 talks about exceptions and modules, and part 4 gives an overview of debugging, project layout, and documentation.
This post is a collection of a whole bunch of unrelated tips that didn’t make it in to the earlier parts.
Along with single and double quotes, Ruby has another method of delimiting
strings. %q{example}
works just like single quotes, and %Q{example}
allows
interpolation just like double quotes. While I like to use braces, almost
anything can be used for the delimiters, e.g. %q|example|
. {}
, []
, ()
,
and <>
are treated specially, in that they can appear in the string without
being escaped, as long as they are a matched pair. This can come in handy when
you want to use interpolation and double quotes in a string:
logger.info(%Q{User "#{user.name}" logged in})
This style of quoting isn’t limited to strings, %r{example}
produces a regexp
just like /example/
. This is especially useful when matching file paths
if path =~ %r{^/assets/mobile/img}
...
end
There’s also %s{example}
for symbols, %w{example1 example2}
for an array of
whitespace delimited strings, and %x{which ruby}
that works like backticks to
execute an external command, e.g. `which ruby`
. As of Ruby 2.0
there’s %i{example1 example2}
that generates an array of whitespace delimited
symbols.
When you’re opening another file that’s part of your project, a template or some config, it’s often best to use the path relative to the current file. You may see this done like
config_path = File.expand_path("config.yml", File.dirname(__FILE__))
You can shorten this a little by taking advantage of the way File.expand_path
works, traversing up the file path with ..
config_path = File.expand_path("../config.yml", __FILE__)
If you’re using Ruby 2.0 it’s even simpler
config_path = File.expand_path("config.yml", __dir__)
When it comes to parsing a config file it’s fairly common to see
config = YAML.load(File.open(config_path))
The YAML library actually has a built in method that does both these operations in one
config = YAML.load_file(config_path)
eval
with const_get
The number one culprit for unnecessary uses of eval
seems to be resolving a
string as a constant, fortunately this is fairly easy to do right.
string = "Enumerator::Lazy"
klass = string.split("::").inject(Object) {|base, str| base.const_get(str)}
Ruby 2.0 makes this even easier, as Object.const_get
understands namespaces.
klass = Object.const_get(string)
Ruby’s Hash has the concept of a ‘default proc’. This is a Proc object that’s
called when a key can’t be found, given the hash itself and the key as
arguments, so you can set/return a default value for that key. It can be set
with the #default_proc=
method, or by supplying a block to Hash.new
One case this is particularly useful is grouping objects together
users_by_city = Hash.new {|hash, key| hash[key] = []}
users.each {|user| users_by_city[user.city] << user}
Ruby has a number of abbreviated assignment operators, these combine assignment
with another operator. The easiest to understand and most immediately obvious of
these is +=
, as in i += 1
increments the variable i
by 1. Most of Ruby’s
operators can be combined with assignment like this, and can be understood by
expanding them out, e.g.
i += 1 # equivalent to:
i = i + 1
i /= 2 # equivalent to:
i = i / 2
A particularly useful instance of this is ||=
(“or equals”), this will only
perform the assignment if the left hand side of the expression is ‘falsy’
(nil
or false
). If the left hand side is ‘truthy’ then that will be the
result of the expression. This makes it really easy to do memoisation.
Say you have a method that fetches some data via an API, and you don’t want to re-fetch the data every time you call the method, it’s now a really easy problem to solve
def posts
@posts ||= fetch_resource(:posts, user: id)
end
You can even make this work when the method takes arguments
def user(id)
(@users ||= {})[id] ||= fetch_resource(user: id)
end
One thing to bear in mind is ||=
slightly breaks the expansion rule (along
with &&=
), and expands out like:
x ||= y # equivalent to:
x || x = y
Ruby’s Array has a lot of helpful methods, and even more through Enumerable.
Whether it’s asserting the contents of an array with #all?
or #any?
,
filtering it with #select
or #reject
, finding an element with #detect
, or
picking a random element with #sample
the chances are anything you want to do
with an array, there’s already a method for it. So it’s always a good idea to
check the Array and Enumerable documentation if you ever
find yourself doing a lot of work getting some information out of an array.
One particular method that’s worth highlighting is #each_with_object
. This
is a specialisation of a particular use case for #inject
.
Normally you’d use #inject
like this, supplying it with an initial value for
the accumulator, then returning a new value each iteration that becomes the
accumulator for the next iteration, until you get the final result.
sum = [1,2,3].inject(0) {|accumulator, element| accumulator + element}
People started using this to build up objects, here’s a common example returning a hash.
ATTRIBUTES = [:id, :first_name, :last_name]
def to_hash
ATTRIBUTES.inject({}) {|acc, attr| acc[attr] = send(attr); acc}
end
However this isn’t great, you need to remember to return the object you’re
building from each iteration, plus it ends up needlessly assigning the same
object to the accumulator each iteration, which comes with a performance
penalty. For this reason #each_with_object
was added in Ruby 1.9 for this
particular use case.
ATTRIBUTES = [:id, :first_name, :last_name]
def to_hash
ATTRIBUTES.each_with_object({}) {|attr, hash| hash[attr] = send(attr)}
end
Everyone knows about Ruby’s Array and Hash, but there are a few other data structures that you should be familiar with.
Set, included in the stdlib, is an un-ordered collection of elements with no
duplicates. However you’ll find that Set’s most useful feature is that its
#include?
method is much faster than Array’s. In fact Set#include?
will
always take the same amount of time, regardless of how many elements it
contains, whereas Array#include?
takes longer the more elements in the array.
require "set"
TEMP_EMAIL_PROVIDERS = Set["temp-email.tld", "throwaway-mail.tld", ...]
def temporary_email?(address)
TEMP_EMAIL_PROVIDERS.include?(address.split("@").last.downcase)
end
Sets do take longer to build than Arrays, so work best when they can be created just once and assigned to a constant, or instance variable of a long lived object.
Another useful data structure is Queue. This is a synchronised, i.e. thread safe, first-in first-out queue. This makes it easy to coordinate work between threads. Here’s a simple example of a program that lets you enter multiple URLs to download, then works through them one by one in the background.
require "thread" # Queue is part of the thread library
require "net/http"
to_do = Queue.new
Thread.abort_on_exception = true # if one thread dies, everyone dies
interface = Thread.new do
loop do
url = STDIN.gets # waits here until a URL has been entered
if url && url != "quit\n"
to_do.push(URI.parse(url))
else
to_do.push(false)
break
end
end
end
fetcher = Thread.new do
loop do
uri = to_do.pop # waits until there's a URI in the queue
break unless uri
result = Net::HTTP.get(uri)
File.open("#{uri.host}#{uri.path.tr("/", ":")}", "w+") do |f|
f << result
end
puts "downloaded #{uri}"
end
end
[interface, fetcher].each(&:join) # don't exit until the threads are done
Thanks to Queue this program could even be extended to download multiple files
at once simply by creating multiple fetcher
threads.
Struct is like a lightweight class, allowing you to collect together a group of attributes similarly to a hash. Where struct can be more useful is that the attributes become methods, and only those defined can be accessed, typos or additions will result in an error. Struct also gives a name to the data structure holding the values, that can make things easier to understand, rather than just another nameless hash.
Point = Struct.new(:x, :y)
def euclidean_distance(a, b)
Math.sqrt((a.x - b.x)**2 + (a.y - b.y)**2)
end
a = Point.new(1, 5) #=> #<struct Point x=1, y=5>
b = Point.new(4, 2) #=> #<struct Point x=4, y=2>
euclidean_distance(a, b) #=> 4.242640687119285
Structs also have #[]
and #[]=
methods, so can be accessed like hashes,
making it easy to gradually switch your code over to using them.
While Set, Queue, and Struct are part of Ruby there’s one more really useful data structure: RBTree. This is available as the rbtree gem. RBTree is – unsurprisingly – a tree data structure, with much the same interface as Hash. However it stores its keys in sorted order, and allows you to take advantage of this with methods to access by range, or customise the sort order used.
require "rbtree"
word_counts = RBTree.new {|tree, key| tree[key] = 0}
path = File.expand_path("the_hound_of_the_baskervilles.txt")
File.open(path) do |file|
file.each_line do |line|
line.split(/\W/).each do |word|
word_counts[word] += 1
end
end
end
word_counts.bound("x", "zz").each do |word, count|
puts "#{word}: #{count}"
end
require "rbtree"
headers = RBTree[
"Content-Type", "text/html",
"Content-Length", "1270"]
# sort with case insensitive string comparison
headers.readjust {|a, b| a.casecmp(b)}
headers["content-type"] #=> "text/html"
headers["CONTENT-TYPE"] #=> "text/html"
# original case preserved
headers.keys #=> ["Content-Length", "Content-Type"]
Another advantage of RBTree is that it can be much faster to construct than a Hash, particularly for very large collections. However whereas Hash has constant access time, O(1), RBTree is a little slower with logarithmic, O(log n), access time.
String#to_i
and Fixnum#to_s
Radix ArgumentsString#to_i
and Fixnum#to_s
both take a radix (aka base) argument for
simple conversion between bases.
# binary
2.to_s(2) #=> "10"
"11".to_i(2) #=> 3
# hex
16.to_s(16) #=> "10"
"FF".to_i(16) #=> 255
# use base 36 to generate a simple random alphanumeric string
rand(36**8).to_s(36) #=> "dtkbi9py"
# convert hex colour to rgb
"0645AD".scan(/../).map {|hex| hex.to_i(16)} #=> [6, 69, 173]
Use constants to give a name to numbers you are using in your code. It might
seem perfectly obvious that that 86400 is a day’s worth of seconds when you
write the code, but setting and using DAY = 86400
is going to make your code
much easier to read when you come back to it.
The example below implements the algorithm from java.lang.String.hashCode()
for interoperability with a Java service. Using constants makes the code much
easier to understand, and gives the opportunity to add a comment without
cluttering the code. Also note how the #overflow_int
method is split out from
#string_hash_code
to further explain exactly what’s happening.
module Java
SIGNED_INT_MIN = -2147483648
UNSIGNED_INT_MAX = 4294967295
PRIME = 31 # n * 31 optimises well in java and gives a good hash distribution
module_function
def string_hash_code(string)
string.bytes.inject(0) {|hash, byte| overflow_int(PRIME * hash + byte)}
end
def overflow_int(i)
((i - SIGNED_INT_MIN) & UNSIGNED_INT_MAX) + SIGNED_INT_MIN
end
end
Java.string_hash_code("foobarbaz") #=> -1165917330
Ruby has a number of ways to build up strings and you should make sure you choose the right one.
User = Struct.new(:id, :name)
user = User.new(42, "Arthur")
user.name << ": " << user.id.to_s #=> "Arthur: 42"
This first example uses the String#<<
append method, but it turns out to
have been a bad choice, which you can see if you ask for the user’s name again
user.name #=> "Arthur: 42"
This is because String#<<
alters the string on the left, appending the string
on the right, rather than creating a new string from the two.
We can fix this by switching to String#+
user.name + ": " + user.id.to_s #=> "Arthur: 42"
user.name #=> "Arthur"
However this can be a bit wasteful, there are 5 strings involved in this
("Arthur"
, ": "
, "Arthur: "
, "42"
, and "Arthur: 42"
). Also we have to
remember to call #to_s
on any non-String objects, and it’s surprisingly easy
to forget to add the space in ": "
when formatting strings like this.
Fortunately Ruby has a solution to this, the string interpolation syntax:
"#{user.name}: #{user.id}" #=> "Arthur: 42"
No unnecessary strings are created, #to_s
is called automatically, and the
formatting is much more obvious.
When you’re formatting strings in Ruby the string interpolation syntax is almost always the one you want to go with.
That’s not to say other methods of building strings are useless.
class HTMLBuilder
def initialize
@string = "<html>"
end
def method_missing(name, *args)
@string << "<#{name}>"
@string << args.join("")
yield if block_given?
@string << "</#{name}>"
self
end
def close
@string << "</html>"
self
end
def to_s
@string
end
end
b = HTMLBuilder.new
b.head do
b.title("test")
end
b.body do
b.p("Hello world!")
end
b.close
b.to_s #=> "<html><head><title>test</title></head><body><p>Hello world!</p></body></html>"
The example above is a good use of String#<<
, just one main string is
created, which gradually grows larger, the temporary strings appended to it are
kept small and are easily disposed of by the garbage collector. Had something
like @string += "<#{name}>"
been used then a new copy of @string
would have
been created and the old, potentially very large, copy would have to be
disposed of. The same goes for @string = "#{@string}<#{name}>"
.
The example below shows when you might want to use String#+
. It’s useful when
you want to join to the end of a string, and are using strings that come from
outside of your method, so you want to make sure you create a new string rather
than modify the one you’re given.
TEMPLATE_DIR = File.dirname(__FILE__) + "/templates/"
def template(name)
name = name =~ /\.erb$/ ? name : name + ".erb"
File.read(TEMPLATE_DIR + name)
end
Ruby’s class variables aren’t its best feature, the way they are shared between all subclasses can lead to confusion, and the fact that they must be initialised before use is an annoying disconnect from instance variables. Class variables are also wide open to all instances, making it hard to keep encapsulation. The solution to this is to instead use class instance variables.
In the example below the scenario is as if you’re writing a handful of supporting objects for a small short-running script or background job. Data is being fetched from a couple of endpoints, you want to cache data so you don’t have to fetch it multiple times, and you want to reuse the same connection for each endpoint.
class APIObject
def initialize(attributes)
attributes.each do |name, value|
send("#{name}=", value) if respond_to?("#{name}=")
end
end
def self.host(host)
@client = nil
@host = host
end
def self.port(port)
@client = nil
@port = port
end
host "api.example.com"
def self.client
return @client if @client
return Net::HTTP.new(@host, @port || 80) if @host
superclass.client
end
def self.cache
@cache ||= {}
end
def self.find(id)
cache[id] ||= new(JSON.parse(client.get("#{path}/#{id}")))
end
end
class User < APIObject
attr_reader :id, :first_name, :last_name
end
class Message < APIObject
attr_reader :id, :sender_id, :recipient_id, :text, :attachment_id
def sender
User.find(sender_id)
end
def recipient
User.find(recipient_id)
end
def attachment
Attachment.find(attachment_id) if attachment_id
end
end
class Attachment < APIObject
host "data.example.com"
attr_reader :id, :mime_type, :data
end
@host
, @port
, @client
, and @cache
are all class instance variables, so
while they are defined in APIObject
each subclass is a separate instance of
Class
, so will have its own copy.
In the case of @cache
this is exactly what you want, you don’t want Message
id 3 to overwrite User
id 3 in a shared cache.
As for the HTTP client you want User
and Message
to share the client from
APIObject
, but Attachment
wants it’s own. We could do the first part with a
regular class variable, but then attempting to set a different client on
Attachment
would overwrite the one used by User
and Message
. The solution
again is to use a class instance variable, this time with some clever logic in
the client
method to look up the inheritance chain if @client
isn’t
available or can’t be set.
Global variables are generally a bad idea. Having some data that can be read or
written anywhere in your code can lead to code that is hard to test and prone
to unexpected behaviour. Also contributing to this problem is a global variable
has no clear owner; why shouldn’t I set $verbose = true
, it’s just as much
mine as it is yours.
If you really do need read and write access to a bit of data across your code a better approach is a module (or class) attribute:
module MyApp
class << self
attr_accessor :verbose
end
end
MyApp.verbose = true
Now you’re not going to clash with some other library using the same global variable, you know who it belongs to so you’re less likely to use it where you shouldn’t, and it’s easier to stub or mock in tests.
By convention, predicate methods (those that return true or false) in Ruby are
named with a trailing ?
e.g. empty?
or any?
. Other languages where the
?
is not available in method names will use conventions like is_empty
or
has_any
. You can save yourself some typing, and write code that better fits
with other Ruby code by skipping the is_
or has_
prefix on predicate
methods. The ?
is enough to signal the method’s intention.
Hash#has_key?
is an exception to this, but this is
deprecated by Hash#key?
Default arguments are evaluated in the same scope as the method, so you can use local variables or even call methods. You can use earlier arguments in defaults for later arguments!
class User
attr_accessor :name, :email, :phone, :default_notify_method
def initialize(name, email, phone)
@name, @email, @phone = name, email, phone
@default_notify_method = :email
end
def notify(message, via=@default_notify_method, address=address_for_notify(via))
Notifier.for(via).deliver("Dear #{@name},\n#{message}", address)
end
private
def address_for_notify(method)
{email: @email, sms: @phone}[method]
end
end
Just like you can nest classes inside modules for namespacing, you can do the same with classes. A class nested within another is known as an ‘inner class’. Inner classes are useful for small utility classes that aren’t going to be used outside of the parent class, or structs/simple classes for structuring data returned by the parent class. Examples of both are shown below.
require "strscan"
class Logfile
class LogScanner < StringScanner
def scan_delimited(begin_char=nil, end_char)
skip(begin_char) if begin_char
scan_until(end_char).sub(/#{end_char}$/, "")
end
end
Line = Struct.new(:ip, :identity, :user, :date, :request, :status, :bytes)
attr_reader :lines
def initialize(file_path)
@file_path = file_path
end
def each
return to_enum(__method__) unless block_given?
scanner = LogScanner.new("")
File.open(@file_path) do |f|
f.each_line do |line|
scanner.string = line
yield Line.new(
scanner.scan_delimited(/ /),
scanner.scan_delimited(/ /),
scanner.scan_delimited(/ /),
scanner.scan_delimited(/\[/, /\]/),
scanner.scan_delimited(/ "/, /"/),
scanner.scan_delimited(/ /, / /),
scanner.scan_delimited(/$/))
end
end
end
# methods to report on log file ...
end
warn
The warn
method is very much like puts
, except it outputs to stderr rather
than stdout. This can be useful to make sure that warning or debugging messages
don’t get mixed in with program output or structured logs.
def is_logged_in?
warn "deprecated method `is_logged_in?' at #{caller.first}, use `logged_in?'"
logged_in?
end
This is especially useful in scripts and command line programs where there’s a good chance stdout will be piped to another program, and you don’t want to interfere with that.
Another important thing in writing a script or command line program is setting
the exit status correctly. The exit status 0
is used to indicate success, and
a positive number up to 255 indicates failure.
A Ruby program that ends normally will exit with 0
. A program that ends due
to an exception will exit with 1
and also print the exception message to
stderr.
There are a number of methods that can be used to set the exit status.
exit # exits immediately with status 0
exit(1) # exits immediately with status 1
abort "message" # exits immediately with status 1, prints message to stderr
raise # raises exception and exits with status 1 if not rescued
fail # alias to raise
Hash[]
The Hash[]
method turns an array of pairs in to a hash.
pairs = [[:id, 42], [:name, "Arthur"]]
Hash[pairs] #=> {:id=>42, :name=>"Arthur"}
This comes in useful with turning the results of Hash#map
back in to a hash.
def string_keys(hash)
Hash[hash.map {|k,v| [k.to_s,v]}]
end
And as of Ruby 2.1 there’s the even simpler Enumerable#to_h
.
def string_keys(hash)
hash.map {|k,v| [k.to_s,v]}.to_h
end
1_000_000
Ruby allows _
in numeric constants, you can use this to make any numbers in
your code easier to read at a glance.
1_000_000
1.000_1
0xff_ff_ff
0b11000011_10101001
The Forwardable module from the standard library can make your life easier when you have an object that is a wrapper around another, or needs to delegate a few methods to an object it owns.
require "forwardable"
class Team
extend Forwardable
def_delegator :@members, :push, :add_member
def_delegator :@members, :delete, :remove
def_delegators :@members, :size, :each, :member?
def initialize(name)
@name = name
@members = []
end
end
You can also use it to turn Law of Demeter violations like
user.address.country
into user.country
.
require "forwardable"
class User
extend Forwardable
def_delegator :address, :country
attr_accessor :address
# ...
end