Understanding HTTP Caching: A Beginner's Guide to Cache-Control, ETag, and Expires Headers

Understanding HTTP Caching: A Beginner's Guide to Cache-Control, ETag, and Expires Headers
Photo by Igor Vymorkov / Unsplash

When you browse the web, you’re often interacting with websites that load data quickly, even on subsequent visits. This speed is thanks to caching — a technique that stores copies of web resources locally so that they don't have to be re-downloaded every time. One way to control this caching behavior is through HTTP headers like Cache-Control, ETag, and Expires. These headers provide instructions on how resources should be cached by browsers and other intermediary systems, such as proxies or content delivery networks (CDNs).

In this article, we'll explore these headers in a beginner-friendly way and explain their roles in improving web performance and resource management. We'll also provide a simple example in PHP to mimic how these caching mechanisms work on both the server and client sides.

What is HTTP Caching?

HTTP caching is a mechanism that allows web resources (such as images, HTML pages, CSS files, JavaScript, etc.) to be stored temporarily either in the browser or by other intermediary systems like servers or CDNs. The main goal is to reduce load times and save bandwidth by reusing previously downloaded content instead of fetching it every time from the server.

To control how resources are cached and for how long, web developers use special HTTP headers. Three of the most important headers in this process are Cache-Control, ETag, and Expires.

1. Cache-Control Header

The Cache-Control header is the most important and widely used tool for controlling how caching behaves in the HTTP response. It tells browsers and caches how they should store and revalidate resources.

Some important directives within Cache-Control include:

  • max-age=<seconds>: Defines the maximum time (in seconds) a resource can be cached before it is considered stale. For example, max-age=3600 means the resource can be cached for 1 hour.
  • no-cache: Instructs that the resource can be cached, but it must be revalidated with the server before being used.
  • no-store: Prevents the resource from being cached at all. This is useful for sensitive data like personal information or private data.
  • public: Allows caching by both browsers and intermediate caches (like CDNs).
  • private: Ensures that the resource is only cached by the browser and not by shared caches.

Example of Cache-Control header:

Cache-Control: public, max-age=86400

This means the resource can be cached by any cache (e.g., a CDN) and is considered fresh for 24 hours (86400 seconds).

2. ETag Header

The ETag (Entity Tag) header is used for cache validation. It is essentially a version identifier for a resource. When a server responds to a request, it generates a unique ETag value for that resource, typically based on the resource’s content.

When a client (browser or cache) requests the same resource again, it sends the If-None-Match header with the ETag value it already has. If the server sees that the ETag hasn’t changed, it can respond with a 304 Not Modified status, indicating the resource hasn’t been modified and the cached version is still valid.

Example of ETag header:

ETag: "abc123"

Client Request Example:

If-None-Match: "abc123"

If the server sees that the ETag matches, it can respond with:

HTTP/1.1 304 Not Modified

This process helps avoid unnecessary data transfer and speeds up the user experience by reusing previously cached content.

3. Expires Header

The Expires header is an older method for indicating how long a resource should be considered valid. It specifies an absolute date and time after which the resource is considered stale. When a cache receives a resource with the Expires header, it stores it and only revalidates it with the server once the expiration time has passed.

However, the Expires header is largely considered outdated in favor of Cache-Control, which provides more flexible and relative caching options (e.g., max-age). Still, Expires can be useful in scenarios where an exact expiration time needs to be set.

Example of Expires header:

Expires: Wed, 21 Oct 2024 07:28:00 GMT

This means the resource is valid until the specified date and time.

How These Headers Work Together

  • Cache-Control is the modern, go-to header for managing caching behavior. It allows fine-grained control over how long content can be cached and under what conditions.
  • ETag works alongside Cache-Control to validate cached content. It ensures that clients only re-fetch resources if they have changed, thus reducing unnecessary traffic.
  • Expires, while useful, is less flexible and can often be replaced by the more versatile Cache-Control.

In practice, Cache-Control and ETag work together to ensure that resources are cached for an optimal amount of time and only revalidated if necessary.

Simple PHP Example: Mimicking Client and Server Caching

Now that we understand the headers, let’s see how we can implement caching on both the client and server side using PHP. We'll simulate a simple caching system using Cache-Control, ETag, and Expires headers.

Server Side: Setting Headers

In a PHP script, we can set these headers based on the resource’s content and request. Let’s simulate a simple scenario with a text resource that changes every time we generate a random string.

<?php
// Server-side code to handle caching

// Generate a unique ETag based on content
$content = "Random content: " . rand();
$etag = md5($content);

// Set the Cache-Control header to allow caching for 1 hour
header("Cache-Control: public, max-age=3600");

// Set an Expires header (1 hour in the future)
header("Expires: " . gmdate("D, d M Y H:i:s", time() + 3600) . " GMT");

// Set the ETag header
header("ETag: \"$etag\"");

// Check if the client’s ETag matches the server’s ETag
if (isset($_SERVER['HTTP_IF_NONE_MATCH']) && $_SERVER['HTTP_IF_NONE_MATCH'] === "\"$etag\"") {
    // If ETag matches, return a 304 Not Modified response
    header("HTTP/1.1 304 Not Modified");
    exit;  // End the script since the content doesn't need to be sent again
}

// If the ETag doesn't match, send the content
echo $content;

Client Side: Mimicking Request with Caching Logic

On the client side (simulated in PHP), we would send the If-None-Match header to check whether the resource has been modified based on the ETag value we received earlier. In a real-world scenario, this would be handled by the browser automatically, but here’s how you might simulate it:

<?php
// Simulating a client request with If-None-Match header

// A stored ETag (from previous response)
$storedEtag = "\"abc123\"";  // Assume this was received from the server earlier

// Make an HTTP request to the server (using file_get_contents as an example)
$options = [
    "http" => [
        "header" => "If-None-Match: $storedEtag\r\n"
    ]
];
$context = stream_context_create($options);
$response = file_get_contents("http://localhost/cache.php", false, $context);

// Output the response
echo $response;

In this example, the client sends the If-None-Match header with the stored ETag. If the server sees that the ETag hasn’t changed, it responds with a 304 Not Modified status, meaning the client can use the cached version.

Considerations When Using These Headers

  1. Caching Dynamic Content: Dynamic content (e.g., personalized user data) should generally avoid aggressive caching. Use Cache-Control: no-store or private to ensure such data is not cached improperly.
  2. Setting Cache Expiry for Static Assets: Static resources like images, stylesheets, and JavaScript files can be cached for longer periods using Cache-Control: max-age=<large number>. For instance, images that rarely change might have a max-age of several months.
  3. Overriding Expires with Cache-Control: If both the Expires and Cache-Control headers are present, the Cache-Control header will take precedence. In most modern web applications, it is best to rely on Cache-Control and use Expires only if necessary.
  4. Revalidating with ETag: Combining ETag with Cache-Control: max-age provides a balance between caching for speed and ensuring content freshness. For example, cache content for a day (max-age=86400) but also check if the resource has changed using the ETag on subsequent requests.

Finally: Which Header to Use?

  • Use Cache-Control as the primary header to define how resources should be cached.
  • Use ETag to ensure that the browser can revalidate cached content and reduce unnecessary data transfer.
  • Use Expires if you need a fixed expiration time, but Cache-Control is often more flexible and recommended.

By understanding and implementing these headers correctly, you can drastically improve the performance of your web application and provide users with a faster, more reliable experience.

Support Us

Subscribe to Buka Corner

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe