URoR 1: Set the Content-Type
URoR stands for ‘Unicode Ruby on Rails’ which is going to be a series on using Unicode with Rails. In this first article I’ll show you how to set the Content-Type header so that the browser knows what you’re sending. (second article)
Set it in an after filter
On the web, the One And Only Sensible Encoding for Unicode is UTF-8, so that’s what we’re going to use. First, make sure your editor is set to save all files encoded as UTF-8. Then create a new Rails application and generate a controller called ‘static’ with an ‘index’ action so that we have something to test with.
$ rails uror
$ cd uror/
$ ./script/generate controller static index
Now add the following to app/views/static/index.rhtml (just copy it from this page and paste it into your editor):
<p>Iñtërnâtiônàlizætiøn</p>
Run the Rails application with ./script/server and go to /static/index where you should get something garbled that looks like this:
Iñtërnâtiônà lizætiøn
The problem is that you haven’t told the browser that you’re using UTF-8. Fix this by changing app/controllers/application.rb to:
class ApplicationController < ActionController::Base
after_filter :set_encoding
protected
def set_encoding
headers['Content-Type'] ||= 'text/html'
if headers['Content-Type'].starts_with?('text/') and !headers['Content-Type'].include?('charset=')
headers['Content-Type'] += '; charset=utf-8'
end
end
end
The set_encoding after filter does two things:
- It sets the
Content-Typeheader totext/html, but only if noContent-Typeheader has yet been set. This is exactly what Rails would have done anyway, but we’re doing it here so that… - It adds
charset=utf-8to everyContent-Typeheader for a text type when nocharsethas yet been set.
If you now reload the page the problem is fixed because the browser is no longer receiving a:
Content-Type: text/html
header, but:
Content-Type: text/html; charset=utf-8
Also set it in your Lighttpd or Apache configuration
It’s a good idea to set the UTF-8 encoding in your web server configuration too. For Apache add the following in public/.htaccess or your main configuration:
AddDefaultCharset utf-8
For Lighttpd, change mimetype.assign in config/lighttpd.conf to:
mimetype.assign = (
".css" => "text/css; charset=utf-8",
".gif" => "image/gif",
".htm" => "text/html; charset=utf-8",
".html" => "text/html; charset=utf-8",
".jpeg" => "image/jpeg",
".jpg" => "image/jpeg",
".js" => "text/javascript; charset=utf-8",
".png" => "image/png",
".swf" => "application/x-shockwave-flash",
".txt" => "text/plain; charset=utf-8"
)
Now all static stuff like 404.html and cached pages are also sent with the correct encoding in the Content-type header.
Even add it to the head
If you want make it easy for people to save your pages to disk and open them with the correct encoding later on, you might want to add the following inside the head element of your html pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Always do this last as it may mask any trouble you might be having with the http headers.
The upcoming 1.2 release of Rails will add utf-8 as the default charset for all renders, so you’ll no longer need the after filter.