WordPress Caching Comparisons Part 2

This post has been on my mind for quite some time now, ever since I wrote Part 1 over 1 year ago.

Part 1 only really addressed opcode 1 and Object caching 2 and didn’t really touch page caching 3. In this post I have revisited all tests and added in comparisons of using both the APC Object Cache + Batcache plugins as well as using the W3 Total Cache plugin.

Tests

  • No opcode, no caching
  • APC opcode, no caching
  • APC opcode, APC object caching plugin
  • APC opcode, W3 Total Cache APC object caching
  • APC opcode, APC object caching plugin, Batcache page caching
  • APC opcode, W3 Total Cache APC object and page caching

Comparison Stats

  • PHP generation time 4
  • Number of include/include_once/require/require_once calls 5
  • Number of stat() calls per dtruss/strace 6
  • cURL time to start transfer 7
  • Apache Bench (ab) tests for concurrency 8 and requests per second

For the above stats gathering, with PHP generation time and cURL time to start transfer, 102 sets were collected, the first 2 were dropped due to cache priming, the remaining 100 were used, and averaged. With the Apache Bench tests, 12 sets were used, dropping the highest and lowest value, and averaging across the remaining 10. Include and stat() counts were gathered over 5 sets not requiring averaging as they were the same between runs.

To find the optimal concurrency and req/s for Apache Bench, I performed manual testing, visually inspecting the results until I reached what I classified as a “sweet spot”. Using the “sweet spot” stats, I performed additional sets to gather the averages for requests per second.

The Setup

  • 256MB Rackspace Cloud Server
  • Ubuntu 11.04 amd64
  • Apache 2.2.17 – Default Ubuntu Install, no modifications, default document root located at /var/www
  • PHP 5.3.5 (mod_php) – Default Ubuntu Install, no modifications
  • PHP APC 3.1.3p1 – Default Ubuntu Install, no modifications
  • MySQL 5.1.54 – Default Ubuntu Install, no modifications
  • WordPress 3.3-beta4-r19470 – Default Install, requests made to the “home” page
  • APC Object Cache trunk version
  • Batcache trunk version
  • W3 Total Cache 0.9.2.4

I have not compared static file caching yet and hope to compare W3 Total Cache and WP Super Cache in the future.  In this comparison I am mainly focusing on opcode, object caching and page caching.

I am going to try to keep this comparison about the stats only, and not make this a critique or review of the plugin, although in some cases this will not be possible.

Test Data

No opcode and no caching:
PHP Generation Time: 0.13787 seconds
Number of includes: 80
Number of stat calls: 266
cURL time to start transfer: 0.15463 seconds
Apache Bench Concurrency: 15
Apache Bench Requests Per Second: 19.1483 req/s

APC opcode and no caching:
PHP Generation Time: 0.05088 seconds
Number of includes: 80
Number of stat calls: 148
cURL time to start transfer: 0.05673 seconds
Apache Bench Concurrency: 60
Apache Bench Requests Per Second: 68.2636 req/s

APC opcode and APC Object caching:
PHP Generation Time: 0.03407 seconds
Number of includes: 81
Number of stat calls: 148
cURL time to start transfer: 0.03975 seconds
Apache Bench Concurrency: 260
Apache Bench Requests Per Second: 77.7214 req/s

APC opcode and W3TC APC Object caching:
PHP Generation Time: 0.03993 seconds
Number of includes: 102
Number of stat calls: 285
cURL time to start transfer: 0.04591 seconds
Apache Bench Concurrency: 200
Apache Bench Requests Per Second: 67.581 req/s

APC opcode and APC Object and Page caching with Batcache:
PHP Generation Time: N/A
Number of includes: Unable to collect
Number of stat calls: 41
cURL time to start transfer: 0.00316 seconds
Apache Bench Concurrency: 600
Apache Bench Requests Per Second: 147.2156 req/s

APC opcode and W3TC APC Object and Page caching:
PHP Generation Time: N/A
Number of includes: Unable to collect
Number of stat calls: 87
cURL time to start transfer: 0.00625 seconds
Apache Bench Concurrency: 500
Apache Bench Requests Per Second: 147.8425 req/s

Conclusions

I can state the following about just enabling APC in PHP, if you do nothing else, you should at least do this:

  1. 170% PHP generation time improvement by enabling APC opcode caching
  2. 172% Time to start transfer improvement by enabling APC opcode caching
  3. 300% concurrency improvement by enabling APC opcode caching
  4. 256% requests per second improvement by enabling APC opcode caching

I see performance improvements using both APC+Batcache and W3 Total Cache. However, in all tests, APC+Batcache seems to outperform W3 Total Cache, in PHP generation time, number of includes, number of filesystem stat() calls, time to start transfer, number of concurrent requests and requests per second with relation to concurrency.

I was able to push APC+Batcache to 700 concurrent requests, but req/s dropped. W3TC capped out at 500 concurrent requests, and would go no further, however 500 requests per second provided the highest req/s for W3TC.

W3TC does provide a lot of additional functionality to help reduce load on the server, such as tweaking client side caching, and using a CDN, where APC+Batcache does not, although there are small unitasking plugins that can add the missing functionality for you such as:

APC+Batcache consists of adding 3 new files, and no new directories. The W3TC download consists of 60 new directories and 351 files. The directory listing level for W3TC being as deep as it is, 5 levels deep past the directory for the plugin itself, causes a significant increase in filesystem stat() commands.

Most shared hosting providers as well as many multiserver environments will often host their web roots on NFS, and the more filesystem stat() calls, the worse performance you will see, especially under higher load.

Something else to note, is a lot can be done on the server to also improve performance. You can also use caching applications that logically sit in front of the webserver to cache, instead of using caching plugins, which will also improve performance. There are probably eleventy billion ways to improve performance, so if in doubt, consult an expert to help.

Notes:

  1. opcode: A technique of optimizing the PHP code and caching the bytecode compiled version of the code, to reduce the compilation time incurred during PHP code execution
  2. Object Caching: An in memory key-value storage for arbitrary data, to reduce processing, and storage of external calls to speed up retrieval and display of information
  3. Page Caching: Full caching of HTML output for web pages
  4. PHP generation time: The amount of time taken to compile and execute the PHP code into the resulting HTML
  5. Include/Require Count: The number of calls to the PHP include, include_once, require and require_once functions, which are used to load a separate file
  6. stat() call count: The number of unix system calls that return information about files, directories and other filesystem related objects.
  7. Start Transfer Time: The amount of time between the request from the client to the server, and when the server begins returning data to the client
  8. Concurrency: The number of concurrent client requests to the server

WordPress Caching Comparisons Part 1

For some time now I have been wanting to write an up to date XCache object cache plugin for WordPress. Around 4 years ago I did an opcode caching comparison between APC, XCache and eAccelerator. My results had shown that at the time that XCache was the fastest of the 3. Unfortunately I didn’t think to keep that data around. As a result of these tests I had standardized the environment I was working on with XCache, and have never thought twice about it. Since I use XCache for opcode caching everywhere, it seemed like writing such an object cache plugin would be beneficial. After writing the plugin I figured it best to test performance, comparing it to the Memcached object cache and the APC Object cache. I tweeted a lot during my initial testing, and got an overwhelming response to write up a post, and here we are…

I’ll try to make this comparison comprehensive, but it can be a little difficult to always cover everything.

The test environment:

  • Toshiba T135-S1310
  • Intel SU4100 64bit Dual-Core 1.3GHz
  • 4GB DDR3 Memory
  • Ubuntu 10.10 64bit
  • Apache 2.2.16
  • PHP 5.3.3
  • PHP XCache 1.3.0
  • PHP APC 3.1.3p1
  • Memcached 1.4.5
  • Pecl Memcached 3.0.4
  • MySQL 5.1.49 No caching configured
  • cURL 7.21.0
  • WordPress 3.1-alpha (r16527) Default install with Twenty Ten and no plugins other than the one I mention below

The times are based off of the standard timer_stop() code often found in the footer.php of themes, in this case added using the wp_footer filter through a mu (must use) plugin:

<?php
add_action('wp_footer', 'print_queries', 1000);
function print_queries() {
?>
<!-- <?php echo get_num_queries(); ?> queries. <?php timer_stop(1); ?> seconds. -->
<?php
}

cURL was used to make the HTTP requests and grab the value from the comment created by the above code:

for (( c=1; c<=101; c++ )); do curl -s http://wordpress.trunk/ | grep '</body>' -B 1 | head -1 | awk -F"queries. " '{print $2}' | awk -F" seconds" '{print $1}'; done;

In each data set I gather 101 results and omit result 1 so that we only have results after the initial cache is generated. The tests are only performed on the home page.

The tests:

  1. No Object or Opcode Cache
  2. Memcached Object Cache with no Opcode Cache
  3. Memcached Object Cache with APC Opcode Cache
  4. Memcached Object Cache with XCache Opcode Cache
  5. APC Object and Opcode Cache
  6. APC Opcode Cache with no Object Cache
  7. XCache Object and Opcode Cache
  8. XCache Opcode Cache with no Object Cache

I didn’t evaluate eAccelerator due to the fact that it isn’t available in the Ubuntu repositories and I did not feel likely compiling…

The results (in seconds):

For a larger view of the spreadsheet above or if you cannot see it, take a look here.

These results are quite interesting and actually shocked me a little bit. The first thing that I found when developing an up to date XCache Object Cache plugin was that it can’t handle objects! So the plugin has to serialize all data when setting, and unserialize when retrieving. This of course is going to add overhead to every operation.

When I first tested the Memcached Object Cache I was surprised at how little it improved speed. It took me about an hour to realize that the comparison of just using Memcached was unfair as it didn’t include any Opcode caching, adding an Opcode cache brings it more in line with what I would expect.

Using an opcode cache improves performance by over 200% on a stock WordPress install without using any object caching. While APC and XCache provided similar results, my tests still show XCache to be ever so slightly faster as an opcode cache.

Where we see the biggest difference between the 3 of these caches when using APC for both opcode and object caching.

Assuming we are using both Opcode and Object caching here are the results from best to worst:

  1. APC
  2. Memcached (With either APC or XCache)
  3. XCache

At this point the single largest failure of XCache is it’s inability to store objects, so I am pretty much planning on dropping XCache on my servers in favor of APC, which will be included with PHP as of PHP 6. I would likely still see marginal speed improvements using XCache on sites that I am not using XCache for an object cache, but on those that I am I’ll get much improved performance off of APC or Memcached.

Now why would I want to use APC over Memcached or vice versa? Well, the one thing that Memcached provides that APC doesn’t is the ability to share the cache between servers. In a load balanced multi web server environment, using APC you would be duplicating the cache on all of the servers as APC provides no way to share this data or allow for remote connections. Memcached however, being a PHP independent daemon can be used for pooling resources and allowing remote connections. You also can get more bang for your buck with Memcached in a load balanced multi server environment because of it’s pooling capability. The pooling capability allows you to dedicate say 128MB of RAM to each memcached instance and when pooled together will give you 128MB x N where N is the number of servers in the pool. Anyway, I digress…

In the end, if you have WordPress hosted on a single web server, APC is the way to go. If you are in a multi web server environment, Memcached is the way to go, but remember to install an Opcode cache as well. If you are crazy and just want to use more CPU cycles, XCache is the way to go.

Some of you may be thinking “why would I need an object cache in addition opcode caching, if the results are similar?” Well, under higher load an object cache will respond better than MySQL, even with MySQL caching. In addition, other factors with MySQL can come into play, such as connectivity to the MySQL server. It may be on another server, with not enough memory, slow disks, with an overloaded network, which decreases performance. Any time that an update query is run, MySQL will flush the whole cache. Another benefit, is we are rarely, if ever, going to use the data exactly as it is given to us from the MySQL query. In the end we are going to process the data before displaying, an object cache allows you to store the processed data, rather than the raw data from the query saving CPU cycles required for the processing. Individually these items may not consume much time, but added together and in a more efficient delivery system, this can make a huge difference.

Now for any of you who go run out and install Memcached, if you install version 1.4.x make sure you get at least pecl memcached 2.2.6 or 3.0.4. Memcached made a change that breaks deletes with earlier pecl memcached versions, which adversely affects WordPress.

A few additional things that I have been asked to talk about are using caching with a WordPress Network, output caching with Batcache and query counts. I promise to get to those, but I just wanted to get this out sooner rather than later.

Yo Dawg! We heard you like caching so we put a cache in your cache, so you can optimize while you optimize…Sorry couldn’t resist.