Lazily sweeping the whole Rails page cache
One of the more convenient features in Ruby on Rails is page caching. Simply add caches_page :show
to the top of a controller class, and all pages rendered by the show action are written to disk automatically. On subsequent request, these pages will be served straight from disk without invoking Rails at all.
This works because of rewrite rules that basically tell the webserver to append .html to the request path. If the webserver can find a file using the resulting path, the webserver will send it. If not, then Rails will handle the request.
Pages are removed from the cache simply by deleting them from the public directory. Rails provides the expire_page
method and sweepers to help with this.
Sweeping is hard
Suppose you are writing a blogging application and you decide to add page caching. When a post is updated, the cached page that shows the post has to be removed. You write a post sweeper for this:
But wait… Your blog also has a front page listing the most recent posts. The updated post might be included there, so you need to expire that page too.
Then you realize you also have archive pages and category overviews…
Ok, but what if a post is destroyed? And what exactly should happen when a category is renamed? And…
When you have an application where a single change can invalidate a large number of pages, the sweepers can get quite complex. It’s easy to forget to expire one or more pages, leading to subtle bugs where old pages are served from a stale cache.
An obvious solution to this would be to just sweep all pages after each change. Sadly, this is not possible with page caching because Rails does not keep a list of cached pages. The files are written directly to the public directory, so there’s no way to cleanly delete them all.
Lazy sweeping
We’ve tried to solve the problem of not being sure which files in the public directory are just cached copies and which pages are static html, by moving all cached pages to a public/cache subdirectory. This seems to work fine for us.
In config/environment.rb, change the page cache directory from the default by adding the following line inside the Rails::Initializer.run block.
Then change the rewrite rules in the webserver configuration. For lighttpd (config/lighttpd.conf) these should be changed to:
For Apache (public/.htaccess) the first two rules probably need to be changed to:
We use the following in app/models/site_sweeper.rb as a single sweeper for all the models in our application.
Finally assign the site sweeper to all controllers and actions that may invalidate the cache.
We’ve also added the following script as script/sweep_cache to easily sweep the cache during development.
This approach can be extended very nicely for the subdomains as account keys pattern. More on that later.