Treat Cache as a Feature, not a Requirement

There is a lot of buzz around performance optimization on web pages these days. Visitors want sub-second responses when they click on a link. They don't want to wait for a page to load or for redirects to resolve, nor do they want to wait for javascript libraries before they can interact with the website and its content. Web developers have a lot of optimization methods in their toolbox to speed up processes. Making sure a web page is only loading the minimal necessary content, isn't performing superfluous server-side requests, and is using optimal front-end methods are all typical performance enhancing practices. Today I want to talk about one of the more heavy-handed methods that can result in severe issues if poorly implemented - server side cache.

The idea of caching itself isn't too evil. There are elements of a website that do not need to be regularly updated. A stylesheet that is only updated a few times a year can be cached on a user's computer to avoid a server request. The stylesheet still has to be parsed by the browser when the site is loaded but caching it can avoid lookup and transfer times. The big downside of cache is that eventually you'll need to update that piece of information and it can be hard to reach out to all of the client-side locations that it is stored.

Server-side caching brings that issue into the forefront. Web pages are rich, dynamic sources of information that change rapidly. Flattening and caching an entire HTML page is (almost) never an option, since the content needs to be updated quickly and easily. However, fetching fresh content from data storage, parsing that data, and filling out a page with the data can take a lot of time depending on how slow or busy the platform is. In an attempt to speed up a website a programmer can save pieces of content on the server for easier retrieval and parsing which will keep the cached data in small, manageable chunks.

There are a few technologies that can be used to cache the data, with most of them using key/value systems for reference. This works if you are storing simple pieces of data but can quickly become unmanageable once the data structure complicates. The reason is simple: you are now storing the same piece of data in two or more locations. If the data needs to be updated it needs to be updated in multiple places, effectively refreshing the cache. The easier it is to track down the updated piece of data the better, otherwise you will run into stale data issues that doesn't do anyone any good.

So why is cache so dangerous, after all? You can use it to speed up a platform and there are some practices you can implement to make it easier and more flexible to use. The problem is with the mindset that decision makers have towards cache… 'If an application isn't returning data fast enough, just implement a caching system'. Cache should be treated as feature, not a solution. If the application is slow and not meeting expectations, go back to the system or reset expectations. Stale data is one of the most dangerous problems a website can have, eroding user trust and immediate negative business repercussions. The application should not depend on cache, and turning it off or refreshing the entire system should not take down data servers.

Whether it's in the work place or in open discussions I've noticed a dangerous dependency on caching systems. If the data is being stored in such a way that it cannot be retrieved easily, than change the storage practices (NoSQL procedures and ideas work great for this). If parsing and transforming the data is taking too long, than optimize these processes or save the data in a different matter. If it takes too long to fill a view with the data than review your templating system to see if there are more efficient ways for displaying or limiting the data displayed. Caching itself is not a bad idea, but using it for server-side data storage can result in stale data problems and potentially difficult duplicate storage issues. Implement it carefully, and always realize that it is not 'the solution' but just another small feature that should be easily turned on or off to give users a small boost in performance.