URoR 1: Set the Content-Type
URoR stands for ‘Unicode Ruby on Rails’ which is going to be a series on using Unicode with Rails. In this first article I’ll show you how to set the Content-Type
header so that the browser knows what you’re sending. (second article)
Set it in an after filter
On the web, the One And Only Sensible Encoding for Unicode is UTF-8, so that’s what we’re going to use. First, make sure your editor is set to save all files encoded as UTF-8. Then create a new Rails application and generate a controller called ‘static’ with an ‘index’ action so that we have something to test with.
$ rails uror
$ cd uror/
$ ./script/generate controller static index
Now add the following to app/views/static/index.rhtml
(just copy it from this page and paste it into your editor):
<p>Iñtërnâtiônàlizætiøn</p>
Run the Rails application with ./script/server
and go to /static/index where you should get something garbled that looks like this:
Iñtërnâtiônà lizætiøn
The problem is that you haven’t told the browser that you’re using UTF-8. Fix this by changing app/controllers/application.rb
to:
class ApplicationController < ActionController::Base
after_filter :set_encoding
protected
def set_encoding
headers['Content-Type'] ||= 'text/html'
if headers['Content-Type'].starts_with?('text/') and !headers['Content-Type'].include?('charset=')
headers['Content-Type'] += '; charset=utf-8'
end
end
end
The set_encoding
after filter does two things:
- It sets the
Content-Type
header totext/html
, but only if noContent-Type
header has yet been set. This is exactly what Rails would have done anyway, but we’re doing it here so that… - It adds
charset=utf-8
to everyContent-Type
header for a text type when nocharset
has yet been set.
If you now reload the page the problem is fixed because the browser is no longer receiving a:
Content-Type: text/html
header, but:
Content-Type: text/html; charset=utf-8
Also set it in your Lighttpd or Apache configuration
It’s a good idea to set the UTF-8 encoding in your web server configuration too. For Apache add the following in public/.htaccess
or your main configuration:
AddDefaultCharset utf-8
For Lighttpd, change mimetype.assign
in config/lighttpd.conf
to:
mimetype.assign = (
".css" => "text/css; charset=utf-8",
".gif" => "image/gif",
".htm" => "text/html; charset=utf-8",
".html" => "text/html; charset=utf-8",
".jpeg" => "image/jpeg",
".jpg" => "image/jpeg",
".js" => "text/javascript; charset=utf-8",
".png" => "image/png",
".swf" => "application/x-shockwave-flash",
".txt" => "text/plain; charset=utf-8"
)
Now all static stuff like 404.html
and cached pages are also sent with the correct encoding in the Content-type
header.
Even add it to the head
If you want make it easy for people to save your pages to disk and open them with the correct encoding later on, you might want to add the following inside the head element of your html pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Always do this last as it may mask any trouble you might be having with the http headers.
The upcoming 1.2 release of Rails will add utf-8 as the default charset for all renders, so you’ll no longer need the after filter.