Boost PHP Speed With “If-Modified-Since” [2/4]
October 5, 2006 § 33 Comments
Improve PHP Performance
This post continues the series where we embarked on the seemingly simple task of enabling If-Modified-Since headers for PHP scripts. What I discovered is that it’s not as easy at it seems!
- Part I: Understanding IMS
- Part II: Watching IMS In Action (you are here)
- Part III: Using IMS For Optimized RSS Feeds
- Part IV: Implementing IMS On WordPress
Part II: Watching IMS In Action
Up to now I’ve called this process “IMS”. What we’re really doing is a conditional GET. In the next section we’ll go over several useful tools for dissecting a conditional GET and watch real data to help explain the challenges and possible solutions involved.
To perform and watch a conditional GET, you need the following (the tools I use in parenthesis).
- An HTTP Monitoring Tool (Fiddler 18.104.22.168)
Fiddler is an HTTP debugging proxy written by Eric Lawrence of Microsoft that allows you to examine request and response header values as they happen. This the tool I used to check which sites use gzip in a previous tutorial. You can also craft custom requests, attach scripts to particular behaviors and filter various response types. I mostly use it as a quick and dirty way to see what’s getting loaded and which requests cause errors.
- An HTTP 1.1-Compliant Browser (FireFox 22.214.171.124)
FireFox is of course, the tool for compatible browsing, but any modern browser should work – the IMS caching standard has been around for years.
- Any scripting/CGI language that allows raw HTML generation (PHP 5.0)
PHP is perhaps overkill for demonstration purposes (you could use a simple batch file), but it is very common so I suspect this should work for most readers. PHP is the scripting language upon which WordPress is based.
Let’s take a peek at the data exchanged between a browser and server…
Example 1: Watch HTML Caching With Fiddler
First, let’s look at what a normal HTML file looks like with Fiddler. The HTML is stripped to the bare minimum for simplicity:
(screenshot: First pass of static.html)
Because static.html hasn’t been loaded before, the browser does NOT send an IMS date, but the server returns a Last-Modified and ETag for next time.
(screenshot: Second pass of static.html)
Press F5 to reload and this time the server responds with a 304 Not Modified, as expected.
All of this has been automatic so far. Now let’s look a more interesting page…
Example 2: Watch a Simple PHP Page
We’ll execute a basic PHP command (normally this would be a data query or series of pre-processor commands for the server, of course).
Press F5 and notice that even though the file has not changed, the server still generates the content each time. No big deal for a small file, but in a WordPress environment, you’re looking at upwards of 100k of related data requested from a database for each user, rss aggregator and indexing robot that visits the site!
Speaking of unnecessary data, I’ll forego the fat screenshot images – you get the idea.
Example 3: Watch a PHP File Using Conditional GET
This is where it gets fun – a conditional GET.
For simplicity, the next example does not actually store or retrieve a LM date for the file – we’re going to hard code one in the header and embed the doConditionalGet function in the same file. The base code comes courtesy of Simon Willison.
As predicted, the file only gets loaded the first time and as long as the cached copy still exists, should NEVER load again for the same browser.
So, are we done?
No, because now we have to figure out how we’re going to track the LM date for this content without creating more of the server load we set out to prevent! We’ll cover this in Part IV: Implementing IMS On WordPress.
But first, we’ll spend time in Part III understanding how all of this specifically impacts content aggregators and RSS feeds.
To conclude this section, I want to introduce one other very useful tool for webmasters: the Cacheability Test by Mark Nottingham.
This tool analyzes a URL for the presence of cache-ready values like Last-Modified and Entity Tag, as well as a few I haven’t mentioned yet
- Cache Control
- Content Length
Your homework is to perform tests on the following URLs.
To speed up cacheability tests, be sure to uncheck “include referenced frames, images and objects”.
- Test 1: Run the cacheability test on http://www.vibetechnology.com/vt/index.php
- Test 2: Run the cacheability test on http://www.vibetechnology.com/vt/ims/with-ims.php
- Test 3: With Fiddler running, click this link then reload it by pressing F5: http://www.vibetechnology.com/vt/ims/with-ims.php
Question: Why will http://www.vibetechnology.com/vt/index.php NEVER be cacheable?