Installing The Varnish Cache System for Drupal 7

I am not going to lie, compared to installing Memcached and APC, installing Varnish was a downright pain in the ass. I found the Drupal module documentation to be very light and documentation elsewhere difficult to piece together. I got it done, however, and the results are definitely worth the extra effort.

Varnish is an extremely fast reverse-proxy system that handles serving static files and anonymous page views much faster and at higher volumes than vanilla Apache (in the neighborhood of 3000 requests per second). If you’re serious about maximizing site performance then this is a must-have caching system. An easier to install alternative is the Boost module, but hopefully this article will make installing Varnish a breeze.

Installing the Varnish Package

Let’s start by installing Varnish on your server by entering the following command lines:

$ sudo apt-get update
$ sudo apt-get install varnish

Varnish runs on server port 80 which is usually the default port running Apache. Let’s edit the Varnish configuration first then update our Apache settings to a new port later.

$ sudo nano /etc/default/varnish
#Add this to the top of the file to start varnishd at boot:
START = yes

#Then look for these settings and edit to match:
DAEMON_OPTS = "-a :80 \
-T localhost:6082 \
-f /etc/varnish/default.vcl \
-S /etc/varnish/secret \
-s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,128M"

The varnishstorage.bin_ is the amount of RAM we’re giving Varnish to save data in memory. Adjust this number to fit within your system. 1/4 of your total server memory is usually acceptable.

Updating Apache and vhost Files

Now we need to edit our ports.conf and vhost files to run Apache on a new port. By default Apache runs on port 80 so we’re going to switch it to port 8080.

$ sudo nano /etc/apache2/ports.conf
# Update settings to match:
NameVirtualHost *:8080
Listen 8080
$ sudo nano /etc/apache2/sites-availbable/domain.com
# Update settings to match:
<VirtualHost *:8080>

Save and restart Apache for these changes to go into effect:

$ sudo service apache2 restart
$ sudo service varnish restart

Configuring the VCL File

Now the real fun begins; editing Varnish’s VCL file to make it play well with Drupal’s caching architecture. This file can make or break your Varnish installation. I’m not going to claim to be a Varnish expert so much of these configurations are based on the work of people much smarter than me.

$ sudo nano  /etc/varnish/default.vcl

My current VCL file:

# This is a basic VCL configuration file for varnish. See the vcl(7) main page for details on VCL syntax and semantics.

# Default backend definition. Set this to point to your content server.

 backend default {
     .host = "127.0.0.1";
     .port = "8080";
     .connect_timeout = 600s;
     .first_byte_timeout = 600s;
     .between_bytes_timeout = 600s;
 }

# Respond to incoming requests.
sub vcl_recv {
  if (req.request == "GET" && req.url ~ "^/varnishcheck$") {
    error 200 "Varnish is Ready";
  }

  # Allow the backend to serve up stale content if it is responding slowly.
  if (!req.backend.healthy) {
    # Use anonymous, cached pages if all backends are down.
    unset req.http.Cookie;
    if (req.http.X-Forwarded-Proto == "https") {
      set req.http.X-Forwarded-Proto = "http";
    }
    set req.grace = 30m;
  } else {
    set req.grace = 15s;
  }

  # Get ride of progress.js query params
  if (req.url ~ "^/misc/progress\.js\?[0-9]+$") {
    set req.url = "/misc/progress.js";
  }

  # If global redirect is on
  #if (req.url ~ "node\?page=[0-9]+$") {
  #  set req.url = regsub(req.url, "node(\?page=[0-9]+$)", "\1");
  #  return (lookup);
  #}

  # Do not cache these paths.
  if (req.url ~ "^/status\.php$" ||
      req.url ~ "^/update\.php$" ||
      req.url ~ "^/ooyala/ping$" ||
      req.url ~ "^/admin" ||
      req.url ~ "^/admin/.*$" ||
      req.url ~ "^/user" ||
      req.url ~ "^/user/.*$" ||
      req.url ~ "^/users/.*$" ||
      req.url ~ "^/info/.*$" ||
      req.url ~ "^/flag/.*$" ||
      req.url ~ "^.*/ajax/.*$" ||
      req.url ~ "^.*/ahah/.*$") {
    return (pass);
  }

  # Pipe these paths directly to Apache for streaming.
  if (req.url ~ "^/admin/content/backup_migrate/export") {
    return (pipe);
  }

  # Do not allow outside access to cron.php or install.php.
  if (req.url ~ "^/(cron|install)\.php$" && !client.ip ~ internal) {
    # Have Varnish throw the error directly.
    error 404 "Page not found.";
    # Use a custom error page that you've defined in Drupal at the path "404".
    # set req.url = "/404";
  }

  # Handle compression correctly. Different browsers send different
  # "Accept-Encoding" headers, even though they mostly all support the same
  # compression mechanisms. By consolidating these compression headers into
  # a consistent format, we can reduce the size of the cache and get more hits.=
  # @see: http:// varnish.projects.linpro.no/wiki/FAQ/Compression
  if (req.http.Accept-Encoding) {
    if (req.http.Accept-Encoding ~ "gzip") {
      # If the browser supports it, we'll use gzip.
      set req.http.Accept-Encoding = "gzip";
    }
    else if (req.http.Accept-Encoding ~ "deflate") {
      # Next, try deflate if it is supported.
      set req.http.Accept-Encoding = "deflate";
    }
    else {
      # Unknown algorithm. Remove it and send unencoded.
      unset req.http.Accept-Encoding;
    }
  }

  # Always cache the following file types for all users.
  if (req.url ~ "(?i)\.(png|gif|jpeg|jpg|ico|swf|css|js)(\?[a-z0-9]+)?$") {
    unset req.http.Cookie;
  }

  # Remove all cookies that Drupal doesn't need to know about. ANY remaining
  # cookie will cause the request to pass-through to a backend. For the most part
  # we always set the NO_CACHE cookie after any POST request, disabling the
  # Varnish cache temporarily. The session cookie allows all authenticated users
  # to pass through as long as they're logged in.
  #
  # 1. Append a semi-colon to the front of the cookie string.
  # 2. Remove all spaces that appear after semi-colons.
  # 3. Match the cookies we want to keep, adding the space we removed
  #    previously, back. (\1) is first matching group in the regsuball.
  # 4. Remove all other cookies, identifying them by the fact that they have
  #    no space after the preceding semi-colon.
  # 5. Remove all spaces and semi-colons from the beginning and end of the
  #    cookie string.
  if (req.http.Cookie) {
    set req.http.Cookie = ";" + req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(S{1,2}ESS[a-z0-9]+|NO_CACHE)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

    if (req.http.Cookie == "") {
      # If there are no remaining cookies, remove the cookie header. If there
      # aren't any cookie headers, Varnish's default behavior will be to cache
      # the page.
      unset req.http.Cookie;
    }
    else {
      # If there is any cookies left (a session or NO_CACHE cookie), do not
      # cache the page. Pass it on to Apache directly.
      return (pass);
    }
  }

  ## From default below ##
  if (req.restarts == 0) {
    if (req.http.x-forwarded-for) {
      set req.http.X-Forwarded-For =
      req.http.X-Forwarded-For + ", " + client.ip;
    } else {
      set req.http.X-Forwarded-For = client.ip;
    }
  }
  if (req.request != "GET" &&
    req.request != "HEAD" &&
    req.request != "PUT" &&
    req.request != "POST" &&
    req.request != "TRACE" &&
    req.request != "OPTIONS" &&
    req.request != "DELETE") {
      /* Non-RFC2616 or CONNECT which is weird. */
      return (pipe);
  }
  if (req.request != "GET" && req.request != "HEAD") {
      /* We only deal with GET and HEAD by default */
      return (pass);
  }
  ## Unset Authorization header if it has the correct details...
  #if (req.http.Authorization == "Basic ") {
  #  unset req.http.Authorization;
  #}
  if (req.http.Authorization || req.http.Cookie) {
      /* Not cacheable by default */
      return (pass);
  }
  return (lookup);
}

# Code determining what to do when serving items from the Apache servers.
sub vcl_fetch {
  # Don't allow static files to set cookies.
  if (req.url ~ "(?i)\.(png|gif|jpeg|jpg|ico|swf|css|js)(\?[a-z0-9]+)?$") {
    # beresp == Back-end response from the web server.
    unset beresp.http.set-cookie;
  }
  else if (beresp.http.Cache-Control) {
    unset beresp.http.Expires;
  }

  if (beresp.status == 301) {
    set beresp.ttl = 1h;
    return(deliver);
  }

  ## Doesn't seem to work as expected
  #if (beresp.status == 500) {
  #  set beresp.saintmode = 10s;
  #  return(restart);
  #}

  # Allow items to be stale if needed.
  set beresp.grace = 1h;
}

# Set a header to track a cache HIT/MISS.
sub vcl_deliver {
  if (obj.hits > 0) {
    set resp.http.X-Varnish-Cache = "HIT";
  }
  else {
    set resp.http.X-Varnish-Cache = "MISS";
  }
}

# In the event of an error, show friendlier messages.
sub vcl_error {
     set obj.http.Content-Type = "text/html; charset=utf-8";
     set obj.http.Retry-After = "5";
     synthetic {"
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
   <head>
     <title>"} + obj.status + " " + obj.response + {"</title>
   </head>
   <body>
     <h1>Error "} + obj.status + " " + obj.response + {"</h1>
     <p>"} + obj.response + {"</p>
     <h3>Guru Meditation:</h3>
     <p>XID: "} + req.xid + {"</p>
     <hr>
     <p>Varnish cache server</p>
   </body>
</html>
"};
     return (deliver);
}

Next, navigate to /admin/config/development/performance and enable the Page Cache setting and set a non-zero time for “Expiration of cached pages.”

Phew. We’re done! With APC, Memcached, and now Varnish installed we have a ridiculously fast Drupal website ready to scale. Running ab -n 10 -c 5 http://andrewdunkle.com/ in terminal shows my requests per second are usually in the 3000’s (a standard Drupal install is usually in the teens so this is a tremendous performance increase). From this point forward, most performance enhancements will come from using efficient front-end code.

Resouces