Matchish: an Exercise in Implementing Pattern Matching in Ruby
Short Preface
Many years ago, when I considered myself a C++ expert (which was presumptous yet excusable for 19 yrs), I’ve liked to think something like “My language is the most powerful and expressive in entire world! I can implement literally any other language’s feature with my templates and operator redefinitions and …”
Now, being Ruby developer in my mid-30s, I became much more pragmatic, but still curious (and sometimes envious) about other languages and with passion to experiments like “what can we do to have this Erlang/Scala/Rust/you-name-it cool thing in our everyday code?”
So, here is one of those exercises: taking the whole idea of “pattern matching” from functional languages and seeing, how we can make use of it.
What is Pattern Matching?
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. (Wikipedia)
Not sounding really cool?
In practice of languages like Haskell, there is a very powerful tool, allowing, in particular:
- checking values against complex patterns (like “list consisting of element, then several, then exactly the same element”);
- decomposition of values (after checking list agains
(x:xs)
pattern you can immediately work with “x” variable, having matched element; - code execution dispatching by patters (you can have several defintions of function for different “argument patterns”);
- guards, which are not only check data structure but also can perform
any checks on value (
x < 15
).
Read about Haskell pattern matching. It’s seriously mind-blowing.
Ruby Has (Sort Of) Pattern Matching
Three Equality Signs
Not everyone is aware of power of #===
operator. In official
documentation it is named
“case equality operator” (suggesting typical, but not only, use) and you
implicitly use it in your everyday code:
x = 6.0
case x
when (1..5) # calls Range#===(x), which returns true if (1..5).cover?(x)
when Fixnum # calls Class#===(x), which returns true if x.kind_of?(Fixnum)
when /foo/ # calls Regexp#===(x), which returns true if x =~ /foo/
when "foo" # calls default Object#===(x), which is just an alias for ==
when ->(x){x > 5} # calls Proc#===(x), which returns true if call(x) returns true
end
You can also define #===
for your class and have very complicated
checks inside case
.
This is hell of close to pattern matching. I’m constantly finding myself using this feature exactly for dispatching. It is not as cool as method overloading, yet concise enough:
def parse(value)
case value
when /^\d+$/
parse_int(value)
when /^"([^"]+)"$/
parse_string_literal(value)
#...
end
end
# or
def output_to_logs(value)
case value
when /^ERROR(.+)$/
log.error $1
when Numeric
log.warn "Number value %.2f" % value
when Exception
log.error "#{value.class}: #{value.message}"
when String
log.info value
# ...
end
end
It is less known that #===
method also used in grep
:
[1,nil,2,'test'].grep(Numeric) # => [1,2]
And in fact for many tasks like parsing and evaluating complex contexts you can store your own array of heterogenous patterns and check some values against them.
But it is only a part of “pattern matching” puzzle.
Argument Decomposition in Expressions
Ruby definitely has some of decomposing-values-by-pattern features:
a, b = [1,2,3] # a == 1, b == 2, the rest is lost
a, *b = [1,2,3] # a == 1, b == [2,3]
a, (b,c), *d = [[1,2], [3,4], [5,6], [7,8]] # a == [1,2], b == 3, c == 4, d == [[5,6],[7,8]]
It is extremely useful for method/block arguments. When we are talking
about Ruby 2.2, we also have **args
for arguments, catching hashes,
and keyword arguments, deconstructing them. Unfortunately, in other
places you can’t has it:
a, **b = [1, {test: 'me'}] # SyntaxError
Even more unfortunately, decomposition in expressions have nothing in
common with #===
, so, you can’t do anything like
list = [1,2,3]
case list
when [1, *rest] # undefined local variable or method `rest'
p rest # you'd like to have [2,3] here... but ↑
end
Regexps
Regexps are different. They are patterns. The can #===
, as we’ve seen
before. They also can decompose:
'test' =~ /(.+)st/
$1 # => 'te'
# or, without nasty globals:
Regexp.last_match[1] # 'te'
# or, more Ruby-ish way
m = /(.+)st/.match('test')
m[1] # 'te'
# or, like it's 2015 already
m = /(?<somename>.+)st/.match('test')
m['somename'] # 'te'
You can think of them as a separate sub-language inside Ruby, which can be in no use outside the domain of string checking. Yet regexps can learn us a lesson of how could matching+decomposition look in Ruby.
RSpec Matchers
They are used, well, for specs and inside framework. But, in fact, many RSpec matchers are complicated patterns for compound types. Like this:
# Fortunately, matchers are smart enough to implement ===
array_including(1) === [1,2,3] # => true
hash_including(test: instance_of(String)) === {test: 'me'} # => true
That’s cool. Really, seriously cool and very Ruby-way. In fact, you can try it right now:
x = [1,2,3]
case x
when array_including(instance_of(Fixnum), 2)
# ....
end
And it just works! I am, in fact, puzzled that we are not seeing more and more of this in real life code (or, I rather not; skip to “Being Pragmatic” section if you want to know why).
Yet those cool matchers still lack “decomposition” of values into variables. (To be fair, it’s not needed in context they are used today.)
So, we need to go deeper.
Implementing Missing Features
There is several attempts to bring pattern matching in Ruby.
For me, all of them (looking pretty similar) are missing the point. They are trying to invent some DSL, mimicing existing pattern-matching languages. Yet, IMO, pattern matching is not some “domain”, it’s rather basic language feature. So, the “domain-specific language” attempt to it is doomed by design.
Let’s try to invent solution, defined by next requirements:
- it should play well with existing Ruby quasi-pattern-matching features:
case
,grep
,#===
; - it should not pollute global namespace (and monkey-patch core classes) with lot of “cool inventions”;
- it should be some “opt-in” solution, which you can use in 2-3 places in your code (not changing anything else) and everything should just work, and just stay readable, not implying some “new paradigm”
- providing all the cool features of deep matching, values decomposition, value guards and more.
So, meet matchish. For now, it is NOT a gem, but rather an attempt to show the way. If you are wondering why I’m not over-enthusiastic over matchish acceptance, please read “Being Pragmatic” section.
Types Being Algebraic
Task: treat any Ruby object as “algebraic” data type, which allows to check its deep inside structure.
Solution: At first, I assumed something like RSpec’s
argument matchers,
like any
, array_including(...)
, instance_of(...)
and so on. But
for “core language feature” it seemed to be too verbose and not very
“guessable”. So, current solution looks like this:
require 'matchish/matchers'
# normal Ruby behavior
Object === 1 # => true
[1, 2, 3] === [1, 2, 3] # => true
# Method #matchish converts any object to pattern
# exactly the same as above:
Object.matchish === 1 # => true
[1, 2, 3].matchish === [1, 2, 3] # => true
# ...but the patterns are powerful and extensible:
# check object attribute:
String.matchish.with(length: 4) === 'test' # => true
# or call object method:
Array.matchish.which(include?: 1) === [3,2,1] # => true
# deep check with `===` for nested objects
[1, 2, Numeric].matchish === [1, 2, 3] # => true
# or even:
[1, *Numeric.matchish].matchish == [1, 2, 3] # => true
# shorter version, yet without including entire module, is also available:
require 'matchish/ma'
Array.ma.which(include?: 1) === [3,2,1] # => true
Array.ma.which(include?: 1) === [4,5,6] # => false
# Aaaaand even shorter, for fun:
require 'matchish/m'
String.m.with(length: 3) === 'wtf' # => true
String.m.with(length: 3) === 'sorry' # => false
Solution seems pretty straightforward for implementation, you can look at it in matchish/matchers.
As it is rather proof-of-concept than library, not all of useful matchers are implemented.
Compromise:
- one (and only one) method
matchish
orma
monkey-patched in each object; - (and it even can be refinement, not patch, in modern Ruby).
Showcase:
# classic functional list length, just for fun
def length(list)
case list
when [Object.m].m
1
when [Object.m, *Object.m].m
1 + length(list[1..-1])
end
end
length([1,2,3]) # => 3
# Something more interesting:
%w[1980-12-28 1983-02-14 1992-03-30 1998-10-17].
map(&Time.method(:parse)).
grep(Time.m.with(month: (2..3)))
# => [1983-02-14 00:00:00 +0300, 1992-03-30 00:00:00 +0300]
Values Being Decomposed
Task: alongside with matching pattern, store matched values in variables.
Solution 1: strangely beautiful
require 'matchish/decompose/bind'
[x = Object.m, x, *Object.m].m === [1,1,2,3] # => true
x.value # => 1
[x = Object.m, x, *Object.m].m === [1,2,3,4] # => false
String.m.with(length: (x = Object.m)) === 'test' # => true
x.value # => 4
The .value
part is a bit ugly, but the overall solution looks cute,
for me.
How it’s done? x = Object.m
part is simple variable assignment.
After it, x
has value of Matchish::Matcher
. After very first comparison
with real value, pattern is “bound” to this value, and all forecoming
comparisons only return true
if the value is exactly the same.
NB: at the moment, I haven’t invented elegant solution to name the
splatted part of match (like y = *any
).
Downside of this approach is you can’t save this kind of pattern in the constant (as it will not know the context of “x”, and is usable only once, after which pattern is “bound”).
Solution 2: Regexp
-like
require 'matchish/decompose/as'
[Object.m.as(:x), Object.m.as(:x), *Object.m.as(:y)].m === [1,1,2,3] # => true
Matchish.last_match[:x] # => 1
Matchish.last_match[:y] # => [2, 3]
[Object.m.as(:x), Object.m.as(:x), *Object.m].m === [1,2,3,4] # => false
Looks like “more complex DSL”, yet, at the same time, less magic. (By “less magic”, BTW, I mean that there is no need for “how is it done?” explanation, as any expirienced Ruby programmer reads this as pretty obvious code.)
Compromise: It seems both solutions are mere compromises, each having its pros and cons. I honestly don’t know which is “more Ruby-way”. The first one seems “smarter” while the last is definitely provides less magic and therefore more usability.
Showcase:
# For "match" solution: the same "list length" functional style
def length(list)
case list
when [Object.m].m
1
when [Object.m, *Object.m.as(:tail)].m
1 + length(Matchish.last_match[:tail])
end
end
length([1,2,3]) # => 3
# For vars solution: some deep matching
def check(hash)
if {(key = Symbol.m) => Time.m.with(month: (m = Object.m))}.m === hash
[key.value, m.value]
end
end
check(birthday: Time.new(2015, 2, 14))
# => [:birthday, 2]
Imperative Inside Functional: Guards
Task: Last but not least. Sometimes you are finding yourself writing code like…
case x
when 'something'
if some_condition
do_something
else
do_something_else
end
end
…which is not too bad, yet branches inside branches inside branches… The whole idea around pattern-matching is - let’s do the only match, no branching.
Solution:
require 'matchish/guard'
x = 10
flag = true
case x
when Fixnum.m.guard{|x| x > 5 && flag}
p "here"
when Fixnum # other case, without guard
p "there"
end
Implementation, again, is pretty straightforward.
Being Pragmatic
All the examples above are implemented in my matchish repository and working. But currently I’m not releasing this “matchish” as a gem (and therefore, not tried to make it feature-full: for ex., of all possible RSpec-like matchers only several necessary for showcase were implemented).
Why?
Look, I’ve invented matchish and tried to use it when working on a parser project (unreleased yet) – that’s the domain where pattern matching should definitely be in use. And there was hard lesson learned:
All in all, powerful pattern matching need to be core language feature.
That’s because of:
- natural look: the more I’ve used “cool” matchish feature, the more was fear of other developers being puzzled or disgusted with code… it became “extraterrastrial” a bit;
- more important: speed, optimization, overhead; it became really easy to write “clever” algorithm implementation in some small method, then call it 1000 times and then profiler shows you tons of Matchish objects created, dispatching checks furiously while binding complex “matching context” and feasting with your memory and CPU time;
- even more important: after all the considerations, it now seems for me that pattern matching IS a paradigm; either you do all the execution dispatching through the pattern matched function alternatives, or it will always be another (typically, more clear) way to express your intentions; the way your language (Ruby) is more naturally support.
Though, it still seems that some incremental enchancements of Ruby
existing “match-y” features will be appreciated by most of us. Personally
I’d be happy to write case
code with inline decomposition, as shown above:
list = [1,2,3]
case list
when [1, *tail]
p other # => [2,3]
# ...
end
Also, what about #===
for types like Array
and Hash
using ===
recursively?
# Current Ruby behavior:
[1, 2, 3] === [1, 2, 3] # => true
[1, Numeric, Numeric] === [1, 2, 3] # => false, but I want true!
P.S. How I Did This Article
As I wanted just to “show the point”, I didn’t want to write extensive specifications for weeks, then updating them, and only THEN writing an article.
But as I wanted to show the point, I’ve need some tool to check at least all code I’ve written for article (before writing an implementation, BTW :) will work.
So, I’ve used so-called “readme-driven development” approach (using your readme as a specification) and tool dokaz helping to enforce this approach (ok, it’s shameless self-promotion, as I’m the author of the tool).
So, at the time I writing this paragraph, I only need to run
bundle exec dokaz Article.md
and look what it outputs.