ActiveSupport::Multibyte Updated
Yesterday Michael Koziarski merged the updated version of ActiveSupport::Multibyte into Rails. The initial reason for the update was Ruby 1.9 compatibility but it turned into a complete overhaul. Not just the code, but also the documentation was revised.
For most people the only noticeable change is the move from String#chars
to String#mb_chars
. People relying heavily on ActiveSupport::Multibyte probably want to read on.
String#chars renamed to String#mb_chars
One of the initial reasons to use a proxy to access characters back in 2006 was to make Rails future proof in case Ruby got some kind of Unicode support on String. Unfortunately Matz decided to use String#chars for one of these features so we had to change the method name. People running on Ruby <= 1.8.6 will get a nice deprecation warning.
String#mb_chars
now returns a proxy on Ruby 1.8 and returns self
on Ruby 1.9.
Note that the Ruby 1.9 String class does not implement methods like String#normalize
. We’re still trying to figure out how to approach this limitation. For now, you might want to do:
No more automatic tidying of bytes
Multibyte no longer attempts to convert broken encoding in strings to a valid UTF-8. The String#tidy_bytes
method still exists if you need this functionality.
Duck-typing aid
Strings are notoriously hard to duck-type because they include Enumerable, which makes them hard to differentiate from Arrays. Rails already had some duck-typing help in place for Date, Time and DateTime. We decided to implement the same thing on String and Chars.
So if you catch yourself using str.is_a?(String)
please consider using acts_like?
.
Different way of registering backends
Instead of registering a handler on the Chars class, you now set the proxy_class on ActiveSupport::Multibyte.
Note that this removes a level of indirection, which speeds up the entire Multibyte implementation quite a bit.
If you’ve implemented your own handler, please look at the implementation of ActiveSupport::Multibyte::Chars on how to convert it to work with the new implementation. In most cases this should be a trivial exercise. Don’t hesitate to contact me if you need help.
Overrideable default normalization form
The default normalization form can now be set on ActiveSupport::Multibyte instead of updating a constant.
See ActiveSupport::Multibyte::NORMALIZATIONS_FORMS
for valid normalization forms.