Question

I ve been catching up with the Scaling Rails screencasts. In episode 11 which covers advanced HTTP caching (using reverse proxy caches such as Varnish and Squid etc.), they recommend only considering using a reverse proxy cache once you ve already exhausted the possibilities of page, action and fragment caching within your Rails application (as well as memcached etc. but that s not relevant to this question).

What I can t quite understand is how using an HTTP reverse proxy cache can provide a performance boost for an application that already uses page caching. To simplify matters, let s assume that I m talking about a single host here.

This is my understanding of how both techniques work (maybe I m wrong):

With page caching the Rails process is hit initially and then generates a static HTML file that is served directly by the Web server for subsequent requests, for as long as the cache for that request is valid. If the cache has expired then Rails is hit again and the static file is regenerated with the updated content ready for the next request
With an HTTP reverse proxy cache the Rails process is hit when the proxy needs to determine whether the content is stale or not. This is done using various HTTP headers such as ETag, Last-Modified etc. If the content is fresh then Rails responds to the proxy with an HTTP 304 Not Modified and the proxy serves its cached content to the browser, or even better, responds with its own HTTP 304. If the content is stale then Rails serves the updated content to the proxy which caches it and then serves it to the browser

If my understanding is correct, then doesn t page caching result in less hits to the Rails process? There isn t all that back and forth to determine if the content is stale, meaning better performance than reverse proxy caching. Why might you use both techniques in conjunction?

Answer 1

You are right.

The only reason to consider it is if your apache sets expires headers. In this configuration, the proxy can take some of the load off apache.

Having said this, apache static vs proxy cache is pretty much an irrelevancy in the rails world. They are both astronomically fast.

The benefits you would get would be for your none page cacheable stuff.

I prefer using proxy caching over page caching (ala heroku), but thats just me, and a digression.

Answer 2

A good proxy cache implementation (e.g., Squid, Traffic Server) is massively more scalable than Apache when using the prefork MPM. If you re using the worker MPM, Apache is OK, but a proxy will still be much more scalable at high loads (tens of thousands of requests / second).

Answer 3

Varnish for example has a feature when the simultaneous requests to the same URL (which is not in cache) are queued and only single/first request actually hits the back-end. That could prevent some nasty dog-pile cases which are nearly impossible to workaround in traditional page caching scenario.

Answer 4

Using a reverse proxy in a setup with only one app server seems a bit overkill IMO. In a configuration with more than one app server, a reverse proxy (e.g. varnish, etc.) is the most effective way for page caching.

Think of a setup with 2 app servers:

User Bob (redirected to node A ) posts a new message, the page gets expired and recreated on node A .
User Cindy (redirected to node B ) requests the page where the new message from Bob should appear, but she can t see the new message, because the page on node B wasn t expired and recreated.

This concurrency problem could be solved with a reverse proxy.

友情链接