Quick ActiveSupport::Multibyte glossary trick

Manfred Stienstra

I was trying to make a glossary of words grouped by their first letter, but I wanted words starting with the letter é grouped with words starting with the letter e. No small feat you might imagine. Wrong.

dict = words.inject({}) do |dict, word|
  letter = word.chars.decompose[0..0].downcase.to_s
  dict[letter] ||= []
  dict[letter] << word; dict
end

The reason this works is that letters like é have a decomposed form in Unicode, this form consists of a latin letter and a accent modifier. I’m not sure what happens if you run Arabic through this code, but we’ll cross that bridge when we get there.


You’re reading an archived weblog post that was originally published on our website.