Those who read the shmups forum regularly will know that recently it was time to migrate the service back to the physical server, and since the hardware was getting quite old and not really performing for the price anymore, I posted a donation drive to buy a new one.
Since the migration, several people have remarked on the response speed of the site, so I thought it’s about time I spilled the information on how everything was configured and tuned.
The donations were enough for me to jump from a HP DL360 G4 up to a much newer G6 model. Not only do these have faster memory and newer CPUs, but they use a lot less power – and that reduces the hosting bill too. It came fitted with a pair of quad-core L series CPUs, these are the low power use CPUs clocked at about 2.2ghz. Since it’s primarily a webserver, I turned on Hyperthreading which presents the OS with 16 logical CPUs. It was loaded with 32GB of RAM, which for reliability I configured in mirroring mode, which presents the OS with 16GB. It also came with four 146GB SAS disks and the battery backed RAID cache module which significantly improves read/write speeds – I configured those as a RAID 10 set for speed and reliability at the expense of capacity.
The operating system picked this time around was CentOS 6, primarily because of the long lifetime of the distribution in terms of patching support. Over time it will become somewhat out of date, but reliability and security are key for internet facing machines. In terms of performance in this area, I went for two main things.
Firstly, even now it’s true that disks have better access times at the start of the disk, so it’s ideal to get the core of the OS (and virtual memory if you expect to use it frequently) close to the start. I wanted to use the Linux LVM, and you can create linear volumes rather than have the data scattered across the disk, but I wanted to create volume groups with a bit of space to grow specific but related filesystems. I also wanted to have a space for backups right at the end since that area is slowest. For these reasons, I created four physical partitions on the RAID 10 device, a tiny one for the boot partition, a small one for the core OS filesystems in a LVM volume group, a much larger one for the commonly read/written data (basically all the websites) with a bunch of empty space to extend later if needed also in a volume group, and finally an archive partition for the backups.
The second step was to tune the kernel TCP stack a little – I’m not sure these are the final settings yet as it’s still early days, they’re a little wasteful but on the other hand even the 16GB presented to the OS is much more than we will ever need, so I was free to really throw it around.
# TCP tuning
net.core.rmem_max = 512000
net.core.rmem_default = 512000
net.core.wmem_max = 512000
net.core.wmem_default = 512000
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
Obviously we’re using Apache as the webserver software. There are more optimised solutions but honestly there’s a reason Apache keeps staying at the top. It’s decently fast, it’s very powerful and flexible and it has a lot of module support. Out of the box most servers are configured pretty poorly – here are some things you need to do, to get the best out of it:
- Keepalives – depending on your system resources and visitor profile, keepalives can be a blessing or a curse. Because we have lots of CPUs available, lots of memory and a moderate to low workload, the best thing to do is set the keepalive timeout pretty high, and the number of requests per session high too.
- Workers – this is your min/max clients section. Depending on load, Apache will spawn new processes or kill idle ones. Obviously the act of creation/destruction involves a time penalty – and again we have loads of memory, so I set it to start 64 and max spares to 64, this means it won’t actually try to close any down beyond the initial pool. I set minimum spare to 10, this is the point at this it will create more. The maximum requests per child I increased to 4000.
- Modules – there’s a section in the config file which specifies which modules to load – these incur a memory and speed hit due to the extra functionality from each module. This is simple – if you don’t need a module, don’t load it. You will need expires, deflate, headers and setenvif for some of the other tunings. Definitely turn off negotiation, it’s for multiple language support on HTML files and the handshake wastes time. I won’t include the config section here as it’s huge.
- Hostname lookups – turn it off, you don’t need them.
- Enable the status check page, limit it to localhost (requires mod_status) – this will allow you to run ‘apachectl status’ from the command line to see what Apache is actually doing, without it you’re blind.
Deny from all
Allow from 127.0.0.1
- Configure expires correctly! This tells browsers when to expire cached items, without this guideline they will just keep asking for things like template gifs that never change. I set image content to cache for 45 days, but html to only 30 seconds.
ExpiresByType image/gif "access plus 1 month 15 days 2 hours"
ExpiresByType image/png "access plus 1 month 15 days 2 hours"
ExpiresByType image/jpg "access plus 1 month 15 days 2 hours"
ExpiresByType image/jpeg "access plus 1 month 15 days 2 hours"
ExpiresByType text/html "access plus 30 seconds"
- Deflates reduce bandwidth, and that increases page speed (usually). This tells the server to compress certain types of content before sending them to the client browser. What we actually do is say ‘compress everything except these’.
SetEnvIfNoCase Request_URI \.(?:rar|zip|gz|tgz|pdf)$ no-gzip dont-vary
Header append Vary User-Agent
- Set cache-control headers – these are for proxies which users are hitting to access your site.
# 480 weeks
Header set Cache-Control "max-age=290304000, public"
# 2 DAYS
Header set Cache-Control "max-age=172800, public, must-revalidate"
# 2 HOURS
Header set Cache-Control "max-age=7200, |must-revalidate"
PHP itself comes pretty much ready to go out of the box in terms of performance, aside from one rather big thing. Using an opcode cache will dramatically increase the speed of any PHP based application you’re running, which includes of course PHPBB. CentOS comes with an installable package for APC (it’s bundled with newer PHP versions too), and it installs a config file as /etc/php.d/apc.ini, which looks like this now (relevant parts only):
extension = apc.so
You’ll want to make sure the management tool (apc.php) is available on your webserver somewhere, in there you can see how much of your cache RAM is in use. In an ideal world you will have enough memory to allocate slightly more than every PHP script on your server requires. Ours does, and the cache hit rate holds at 100%.
It’s also worth tuning MySQL, this is your database backend after all. The settings you need here will very much depend on your database size and combination of sites on your host. Essentially you want to try to reduce or eliminate the use of on-disk temporary tables, keep as much of the key index in RAM as possible, and use a sensible amount of query cacheing. There is an excellent tool which can help you, called mysqltuner. Run it after your server has been in service for a week or so and it will recommend tuning settings, repeat this process. Based on my own knowledge and recommendations from that tool, the relevant part of our config now looks like this (note we don’t use InnoDB tables so don’t need the support, and use local UNIX sockets rather than TCP based connections):
There are a few tweaks we can do with PHPBB too. Firstly, we can tell it to use APC to cache templates instead of disk. In your config.php, include the line:
$acm_type = 'apc';
Secondly, you should tune your actual page code – for example inlining small snippets of CSS code instead of including them – this means less items blocking the page display until loading has completed. Make sure your images are all compressed, try not to include off-site assets – for example our Paypal donate button is actually served locally instead of from Paypal’s servers, thus is just gets hoovered up in the Keepalive session along with the other components resulting in fewer calls (HTTP and DNS) for the client.
Google do a very good page analyser which will highlight lots of faults in your code or the way the content is served, you can find it here:
You can also find sites who will check the speed of your site and provide a breakdown of bottlenecks and the requests required, the best one I’ve found is here:
The end results of all the work on the new server were dramatic. Our PageSpeed Insights scores are 98/100 for desktop users and 92/100 for mobile, and the site isn’t even optimised for mobile… The speedtest results were equally dramatic, previously page loads took 1.5 to 3 seconds on the VM host, and about 1 to 1.5 seconds on the old hardware one. We now serve initial pages to some users in just over half a second, and second load results in just under 1/4 of a second, placing the forum in the top 1% of tested sites at Pingdom.