Set up caching

WARNING - TOPIC NOT WRITTEN - TOPIC ID: 3353

This topic has not yet been written. The content below is from the topic description.

Configuring Caching Apache HTTPD can perform caching in two different ways – either by caching to disk the header and body of a response (mod_cache_disk) or by opening a file(s) at httpd start and saving that file handle in memory (mod_file_cache). Mod_cache disk allows you to cache different request types, including ones with query strings whereas mod_file_cache will not allow this since it just is caching filehandles for quicker access. Most importantly, if you are not very careful with mod_file_cache, you can create a very large number of open file handles, resulting in an unresponsive system. Therefore most organizations choose to leverage mod_cache_disk. In the example below, we do the following configuration with mod_cache_disk: 1) Wrap our configuration in an IfModule clause. We do this because we don’t want the directives to attempt to be ran if the module isn’t available. This way our configuration will be syntactically correct even if the module isn’t available. 2) Define where on disk this cache should reside. NOTE – this directory must be writable by the user that apache is running under, otherwise caching will not work. 3) Enable caching of type disk and configure what should be cached. This can be a complete file path (relative to domain root), wildcarded for extension types, or a regular expression. This directive can also be repeated. 4) Put bounds on the size of the cache in terms of the number of directories on disc as well as how large the directory names can be and finally the total size of the cache. If we want deep caches, we want a high value of CacheDirLevels and a low value for length. NOTE – the product of CacheDirLevels and CacheDirLength cannot exceed 20. 5) Put upper and lower bounds on the size of the items to cache. Small items (like images that are just 1K) will not give you that much performance gain by caching and will also fill up your cache (think depth of tree) with small items. On the flip side, large files that are cached won’t boost performance much because the majority of the latency in performance would be do to transmission, not file I/O. Because of these reasons, reasonable upper and lower bounds should be set. 6) A default time for cached items to expire is set. This is applied if an expiry header is not created for the items that explicitly sets its time to live. 7) Logging is configured so we can get visibility into our cache hits and misses. 8) Finally, we explicitly disable caching for some volatile content. ##Best Practice – check module defined / available before using ##directives ##Where on disk we are going to store the cache CacheRoot /usr/apache/cacheroot ##Enabling caching to disk of all content start at root. Can also use ##wildcards to narrow down what is cached, i.e. CacheEnable disk /*.css CacheEnable disk / ## How many directories to have in cache CacheDirLevels 5 ##Length of directory names in the cache CacheDirLength 3 ## Put a cap on how much cache space we want to use in Kbytes CacheSize 2000000 ## Only cache files between 64 and 64K bytes CacheMinFileSize 64 CacheMaxFileSize 64000 ## If an item doesn’t have an expiry header, expire it from cache after ##1 day CacheDefaultExpire 86400 ##Setup logging so we can see cache hits, misses, and revalidate cache ##items CustomLog cached-reqeusts.log common env=cache-hit CustomLog uncached-requests.log common env=cache-miss CustomLog revalidated-requests.log common env=cache-revalidate ## We don’t want to cache highly volatile content CacheDisable http://www.somedomain.com/real-time-stock-market-quotes/ To verify our cache, we will use Apache Benchmark (included with EWS at $EWS_HOME/sbin/ab). Apache Benchmark is a relatively robust tool that can handle authentication as well as posting data. However, for this example, we are simply looking at creating a large concurrent load. To do this, we will use ab with 2 flags: 1) -c for the number of concurrent users 2) -n for the number of requests to run First we run this without caching enabled. Below is the command with the output (mean times not shown). Note that when you run ab, you have to go against either a complete url to an asset or a complete domain with trailing slash: Illustration 19: Running Apache Benchmark against site without cache We can verify that no cache was used by checking our cache folder: Illustration 20: Checking cache folder Next, we enable the caching with the directives we showed earlier and rerun the same test: Illustration 21: Running Apache Benchmark with Caching enabled We verify again our cache directory on disc to make sure items were being cached: Illustration 22: Checking cache folder, verifying that cache is populated This confirms that we are caching our content. Also note that in our second load test the same amount of html was transferred, but our performance times were halved while our requests per second was doubled. We are getting significant gains with simple caching. Other items to consider with caching: Static content as well as output from dynamic applications can be cache. When leveraging virtual includes in your pages or site, be sure to use ‘include virtual’ instead of ‘include file’ to be able to take advantage of caching (since caching is URL based, not file reference based) Highly time-sensitive content should not be cached Only GET requests with a response code of 200, 203, 300, 301, or 410 can be cached