I’ve looked at caching and e-tags in particular on multiple occasions over the years.
Caching is usually implemented when a bottleneck has been hit. In most small applications this will be one page that requires complex queries across your database. If you’re lucky the freshness of the result isn’t too important and you can just cache the result for a fixed time period, reducing load times quite a bit.
So you pull up the code that’s causing the problem and use your frameworks caching layer to cache it for 1 hour and boom you’re done.
You could go the extra mile and save on bandwidth by adding html cache headers to the response (Cache-Control, Expires, Last-Modified, pick your poison).
This is a great solution, but what happens when you need to invalidate the cache as soon as the underlying resources updates? If you are presented with this problem you will most likely stumble upon e-tags.
E-tags our savior, or not really
The promise of e-tags is great. Every page has a version number or a tag (hash), so when a page is requested we can quickly check if there’s a newer version available. This sounds great, but how do we create these version numbers? That is the dirty secret. When you first read about e-tags you will see implementations that render the entire page and creates a md5 hash from it. This md5 hash is then used to check if there’s a new version available.
The problem with this approach is that you’re still hitting that expensive query on every request. The real saving here is on bandwidth. If the user already has the latest version your server will just send back HTTP 304 “Not Modified”. Sure this can save you a lot of money at scale, but it’s not fixing the problem of long load times.
E-tags done right
The problem above was with the implementation not e-tags as a concept. Instead of loading the entire page to create a version tag, we should instead decide the minimal amount of work necessary to create the version number. If we didn’t change the view file, why would we need to render the view to make a md5 hash? Couldn’t we make an md5 hash of the data and achieve the same result? We absolutely could.
Going further do we need to do data processing before we create the hash? No. So with a little more effort it is possible to bypass bandwidth, view creating and data processing. Suddenly e-tags start to make sense, but often the most expensive calls in your app is calls to the database. How can we move the version e-tag comparison up before your database calls?
We need to flip the current approach on its head. Currently we’re creating version tags as an afterthought. Instead we need to create new version tags as soon as the underlying data changes. Instead of creating the same version tags on every request we need to create a dependency graph for every page and check if a resource is in that graph on save. If a resource is in the graph then the version number needs to change. Now validating the e-tag is is as fast as your cache can fetch the latest version tag. Implementing this is far more complex than making a md5 hash of a request, but in some cases it’s worth it.
I’ve put it on hold for my current app temporarily so I don’t have the full implementation notes yet. But when I do implement it I will create an event listener and push every resource change to a queue. The graph will be stored in json and the version numbers will be stored in redis keyed on controller function names.