You should not implement #to_a for your classes
Well, occasional reader of this blog is no stranger to strong statements, so here is one more of them.
For a short preface, lets reiterate about one useful Ruby idiom: Kernel#Array method, a.k.a. “I don’t care; it should be an Array!”
If at some point of working with data you have a method that could’ve received, or returned, both singular value and array of values, and at next point you need to remove this distinction, and have definitely an array, you can do that:
Array(1) # => [1]
Array('test') # => ["test"]
Array([1, 2, 3]) # => [1, 2, 3] -- nothing changed, it is already an array
Array(nil) # => [] -- well... makes sense
That’s one of the small brilliant pieces of functionality that is easy to overlook, but nice to have, and it follows your intuitions most of the time. Array(nil)
is a nice touch here, being “less logical” than returning [nil]
, but in fact more of “what you really wanted” in real life.
But sometimes, this intiutions got broken:
val = Time.now
Array(val)
# => [3, 21, 10, 2, 2, 2018, 5, 33, false, "EET"]
Ugh… WHAT? The thing is, Array
method works by first trying val#to_ary
, and if it doesn’t respond, val#to_a
, which Time
eventually implements.
Note: here is an explanation of
#to_ary
vs#to_a
difference and its importance.
Which leads us to a valuable lesson:
Never implement #to_a
for your class (especially if it is representing value objects), unless it IS a collection.
But why?..
I’ll ask you this: what is the semantics of #to_a
, if the object is not a collection?
Most of the time (like in Time
’s case) the existence of this method is caused by the fact that Ruby does not have “array vs. tuple” distinction, so it means “convert to some meaningful tuple of internal values.”
But for this task, there are no benefits in using “standard” name, as compared to specialized one. There is no situation (at least, I can’t imagine one) when you have, say, an array of heterogeneous objects, and you want to call array.map(&:to_a)
and if one of the objects is instance of Time
, it is expected and desired to have “list of time components” in its place. It instead means you have some bug (like something returned Time
when you’ve expected an array).
Moreover, selection of specialized method name almost always leads to a cleaner design. As an example, I recently run into the same problem when playing with my own Geo::Coord datatype. Being designed closely after Time
’s design, it used to have #to_a
method, and return [lat, lng]
. Again, it didn’t play well with Array()
idiom… So, I decided to rename #to_a
into …. just #latlng
.
Which, by the way, advice the existence of symmetric #lnglat
method, and, indeed, some APIs expect a pair of coordinates in that order. It can be modeled with to_a(order: :lnglat)
or something like this, but two methods (and NO #to_a
) seems the cleanest and most discoverable way.
Q: What about other #to_<x>
methods?
You may ask: if it is so bad about #to_a
, should we also abandon #to_h
or #to_s
for our value objects? You should not. Because:
- Their semantics is different, and less questionable one.
#to_s
is “how the object would be represented onputs
and string interpolation.”#to_h
, in those JSON-fueled times, became a useful idiom for “dump entire object into basic data types, while preserving its meaning visible,” in fact, a proper and modern replacement for “dump entire object into some tuple.”
- Despite the fact
Hash()
method also exists, it is much more restrictive, and it is much harder to imagine code using the idiom “convert any value into hash” while NOT expecting some value objects to be convertible.
Q: Why does Time
implement it?
Historical reasons, I believe. At the time of Time
design (pun is not intended), there was no notion of keyword arguments, or even key: val
Hash syntax shorthands. So, core classes designed at that dark ages, if they should’ve been able to be constructed from many values (like Time
does), just had to accept this long list of values to the constructor. Like Time
still does, in several different ways – because with just positional arguments it is hard to design proper constructor from 1 to 10 values.
Therefore, it seem logical and symmetrical to have “deconstruct” method, so that it can be easily constructed back:
Time.mktime(*Time.now.to_a)
# => 2018-02-02 06:57:57 +0200
The funny thing is to present date time have no keyword arguments constructor, and no #to_h
method either.
Q: What other core classes do implement to_a
?
Object.constants.map(&Object.method(:const_get))
.grep(Class).select { |c| c.instance_methods.include?(:to_a) }
# => [Array, Hash, NilClass, File, Struct, Enumerator, MatchData, Dir, Time, Range, IO, StringIO]
Alongside already mentioned Time
, what stands out here is Struct
, which is core language base for all value objects… And fall into the same problems with “is it Array” question. But with Struct
it is even stranger (if you think in “value objects” mindset): it not only implements #to_a
but also #each
and includes entire Enumerable
! So it probably thinks of itself more as a “fancy hash with the dot-key notation” than value object.
Which leads us, by the way, to one more questionable rule:
Bonus: You should not implement #each
either (unless it is a collection)
The reasons are the same: what is semantics for “each value in X” if X is not a collection?.. It is pretty vague, and some core classes still demonstrate that:
Quick, from the top of your head, what IO#each enumerates?..
OK, most of us already know the answer (lines, which are split by… $\
, or $OUTPUT_FIELD_SEPARATOR
, or else what you’ve passed to the method as a string separator), but intuitively it is neither only possible implementation, nor the most expectable one. Well, it was in the age of “scripts” and plaintext, but let’s face the truth: this age is far gone, and now “iostream, each (something)” can mean bytes, or Unicode codepoints, or buffer chunks or whatnot. IO#each_byte and IO#each_line(separator) have the same flexibility, and much more readability.
Historical note:
String
used to have#each
and beEnumerable
(with the same meaning as forIO
), but it was rethought, and now we have#each_<byte,char,codepoint,line>
and no bare#each
. Which is good.
So, please read Arcency’s excellent “Stop including Enumerable, return Enumerator instead”, and name the method #each_<something_meaningful>
, and don’t include Enumerable
.
Unless it is a collection.
Q: But what if it IS a collection?
Then implement #each
, include Enumerable
(which will give you free #to_a
, which, though, you may reimplement more effectively if you need), and also probably implement #to_ary
, which will give you a bonus of playing well in *your_collection
context.
Side note: Being bold, I’d even say that
Hash#each
and#to_a
never felt that comfortable for me. Again, there is hard to imagine the context whereArray({some: 'hash'}) # => [[:some, "hash"]]
would be most expected outcome, and sometimes I feel like#each_pair # => Enumerator
could suite for a better code than includedEnumerable
. But maybe I am just high on architectural astronautics.