Pentagramme

Website Optimisation: Part 3

by Keefe Tang

The first part of the web optimisation article discussed on producing clean html code while the second part of the article focused on various optimisation outside the markup. The idea was simple, serve as little as possible, as fast as possible.

While we do our best to serve the files as fast as we can, it still wouldn’t be faster than serving them from cache. Bare in mind that only viewers that have previously visited your site is able to save the files as cache in their computers and even for major websites like Yahoo!, 40% - 60% of their visitors arrive at their site with an empty cache.

Web Cache

The idea of web cache is to store frequently used objects closer to the viewers through various means. By doing so your viewers avoid round trips to the web server, saving bandwidth, server load and most importantly faster loading, much faster loading.

There are many ways that you can set cache control rules for your web site, the simplest way is through html <meta> element, but like I said it produces ugly code and it is poorly supported by browser cache.

Caching does not concern your viewers and they certainly don’t need to see your rules for caching, the rules only concerns the browsers thus only the browser should be informed on how you want to cache your files. Writing the cache rules in your .htacess file would be the best way to achieve that goal.

Caching Methods

Last-Modified

When a browser sends its http request to a web server, the http response of the server could optionally include a Last-Modified http header to tell the browser when the file was last modified. The next time your visitor reloads your page, their browser will asks the web server if the file has changed since the browser’s last request by including a If-Modified-Since http header. If it is changed the browser will download the newest file, if not the browser will use the file in it’s cache.

ETAG

It works roughly the same as the Last-Modified http header but instead of telling the browser when it is last modified it tags the file with a generated checksum value.

This is used because sometimes—but not very often, at least I hope it’s not—the server’s clock gets messed up, that would mean the cache is not accurate anymore. That is why etag is much more preferred, the key is unique, less likely to be messed up by errors like misadjust time.

etag however is also plagued by it’s unique identifier, since the key is unique, sites that is hosted with multiple server or using cdn will not match etag generated from another server, thus rendering it useless. It is recommended that etag be removed by the Yahoo! Exceptional Performance team.

Expires

While checking with the web server to see if a file has changed is better than getting the file itself, there is still a need to check every time viewers reloads the page. This is when Expires http header starts to look like a better alternative to etag & Last-Modified.

What Expires does is tell the browser when it sends a file that the file can be used over and over again until a certain time. The browser will use the cache version of the file every time your viewers reloads the page unless it expires.

set it up

The instruction is for Apache, if you are using iis seek elsewhere for instruction on how to set it up. While you can write the exact cache control header to all the files you want cached, I prefer using mod_expires to specify the cache rule.

Before we proceed make sure your Apache server have mod_expires & mod_headers installed and loaded.

cache files by extension

This method targets files with the specified extension, while that can be very simple to perform and understand it has it’s own downside. For one it requires a file extension.

<FilesMatch "\.(html|htm)$">
  Expires A3600
</FilesMatch>

The code above basically matches every file with the extension .html or .htm and adds an expiry date, 3600 seconds after access (A). To target other file extension just change the extension and modify the expiry date as you see fit.

cache files by mime type

Targeting the mime type is much accurate than file extension as the extension can be easily changed but this method requires you to remember setting the correct mime type. Changes like gzipping the file can change the file’s mime type and without properly defining the mime type, the file will not be targeted.

ExpiresByType text/html A3600

Just like the code before, this code basically specifies that the mime type text/html expires 3600 seconds after access (A).

By enabling ExpiresActive, we can specify the expiry date in a more plain english like syntax. Instead of defining it A3600 it can be written like access plus 1 hour which means the same.

source

The article series is sourced from various articles including, Website Optimisation Measure by Kroc Camen, Website Optimisation Measure series articles by Jens Meiert, Optimizing openSpaceBook by Ryan Doherty, Exceptional Performance by Exceptional Performance team from Yahoo and 14 Rules for Faster-Loading Web Sites by Steve Souders.