I'm now working on my first major webapp, so I've been learning how to be a sysadmin, pretty much from scratch (almost no previous experience with running webapps, lots of experience with other things though).
Anyway, I've been trying to find resources to help learn, but it's been extremely difficult. These are the questions that are still bugging me, hopefully someone here can point me to some good reading about them. Note that my stack is Python/Django over Apache (mod_wsgi) right now. Also note that my site has a "ping" system, where each client connected continuously pings the server every few seconds to see if any new info arrived, making me have to handle a larger amount of requests per second, I believe.
The questions:
1. What kind of load do I need to handle? Is 12,000 requests per second terrible/good/great performance? How do I go about figuring out how many people are online at once for most sites? How do I even estimate it?
2. How can I test the performance of my application? I've learned that Apache Bench is used a lot, but are there better tools?
3. What are the best tools to help me monitor and understand the load on my server?
4. How do I go about understanding the bottlenecks in my application? Right now, my Apache process is taking most of the cpu. What does that imply about where I should optimize?
Sorry to braindump, but I've been looking for answers to these questions online and haven't found any clear help.
As a "meta-answer", you might consider breaking these questions up and asking them on one of the stackexchange sites.
However:
1. "What kind of load" depends on your anticipated visits to the site. There's no substitute for real traffic, but simulation can do in a pinch. Likewise there's no substitute for simulation, but capacity planning tools can do in a pinch. Look at The Art of Capacity Planning, The Art of Application Performance Testing and Guerilla Capacity Planning.
2. There are many tools, some quite sophisticated. Look at siege, httperf and others like them. Consider also using automatic UAT tools (like Cucumber) as the basis for performance and stress tests.
3. Look into monit, zenoss or nagios. There are others.
4. Profiling, profiling, profiling. And ditch Apache, it's a memory hog. Consider lighttpd or nginx.
When you say ditch Apache, do you mean completely? I know a fairly common setup is to use nginx+Apache to serve Django (with nginx serving the static content and Apache the Django code). Should I be running everything from nginx?
You should at least try nginx/gunicorn (http://gunicorn.org/). And if your clients are constantly pinging and the data hasn't changed, make sure you are caching it in memcache (etc) to relieve the database.
Honestly, I'm a little scared of using something like gunicorn. I could be wrong, but it seems like an extremely new technology, especially when compared to Apache, which has a lot of history behind it.
With Django - or any ORM it's a good idea to keep an eye on the SQL that your application is running. Use the django debug toolbar[1] and run your SQL queries against a profiler. Most of the time, that's where your serious bottlenecks will be. If a query happens to be a bottleneck, there are ways the ORM can improve performance (e.g. select_related() to reduce the number of queries in a loop, only() and defer() to reduce the number of columns in a query, and so on[2]. In some cases this may even involve using pure SQL or caching heavily used queries.
Look at using processes vs threads with mod_wsgi, which should help improve performance. Ultimately however you might find nginx a better option, particularly if you are using it for static media/load balancing anyway. I'd suggest uwsgi (also heard good things about gunicorn). Use supervisord or another process monitor to make sure the lights stay on.
"Is 12,000 requests per second terrible/good/great performance?"
Re: this, I don't understand this type of benchmark. A 'request' is completely undefined. Serving 12k 5-byte static pages is quite different from serving 12k frontpages with 25 db queries behind them. Is there an industry-standard interpretation of 'request' that I'm not aware of?
Here is the secret to getting the whole nginx/fastcgi vs. apache/mod_php thing to click in your head.
The trick is to think of it in two layers. The "spoonfeeding" layer and the "worker" layer.
In nginx/fastcgi, nginx is the spoonfeeding layer and the fastcgi worker procceses are, you guessed it, the workers. Nginx effectively acts as a proxy, holding open the thousands of connections users have made but only a small fraction of them are actually being worked on by the worker processes at any given moment.
With apache/mod_php there is no spoonfeeding layer. Every single apache child has the php interpreter loaded and acts as a worker process. When you look at your nginx vs. your apache setups think of apache children as the fascgi workers not as nginx.
Most of the time people switch to nginx and claim amazing results what got them that was the addition of a spoonfeeding layer. Technically, you could have done that several other ways. Most higher end sites who've had to care about this stuff for years let the loadbalancer handle the spoonfeeder layer, that way the (web)appservers can focus on being appservers. Doesn't matter if its a $120K Netscaler or a $0 copy of varnish or ha-proxy bound to localhost, the point is you never want a 1:1 relationship between "active but idle user connections" and "worker processes".
A core issue for modern websites is one of cache invalidation. The linked article essentially sidesteps the issue, with the in-memory speed of memcache obscuring the problem.
For traditional LAMP-style document-producing engines, cache invalidation strategies rely either on TTL (leading, as he says, to stale data) or on polling the source data (leading to an unavoidable performance hit, amortised over the improved speed of the cache).
Leaving aside TTLs, the key issue is that cache invalidation is driven by GET and not POST requests. I wrote a thesis proposal where part of the concept was to drive all cache invalidation from POSTs. New comment added to a story? A regeneration is queued up. New post on front page? A regeneration is queued up.
Firstly, you can improve both staleness by only regenerating when new data is added, and you improve performance by not needing to poll the source data for currency every time you touch the cache. In an ideal situation you could come close to raw HTTP serving speed.
You also allow some degree of dynamic responses to load. Under high rates of POSTs you can batch up regeneration events to prioritise the GETs.
However, I won't be pursuing that project -- I've been accepted for one I was more interested in.
For those interested in full page caching, smart cache invalidation & and a built in crawler. Check out Drupal's boost module. http://drupal.org/project/boost
I'm the guy who has shaped this into the monster it has now become. The apache htaccess rules are still getting refined & it comes with lighttpd and nginx rules as well.
I've been working on taking this full circle; taking the monster that was created, keeping the core parts that work and spiting the other parts into separate Drupal modules.
http://drupal.org/project/expire will contain the cache invalidation logic and I'll create a crawler once I come up with a better multi process crawler http://groups.drupal.org/node/126624
I think the reason that cache invalidation scheme is not in wider use is that it's not very generalizable, and as data model complexity increases things get out of control. If someone adds a new post, you may end up having to regenerate the post page, the pages before it and after it, an authors page, any number of tags pages, etc. So I think it works sometimes in specific instances, but not as a general caching strategy.
If it worked though, it would seem to be the most beneficial. Is anybody working on it?
I've been thinking along similar lines for my project and the issue i'm running into conceptually is tracking WHICH cache records to invalidate upon POST. Say an HTML list, an HTML detail page, and a full composite object all contain references to the particular item you updated. Single items - the detail page and object are fairly easy to know about, but what about a list view and a grid view that also contain that item. What are your thoughts?
A dependency graph between database tables and pages, no? How to define 'page' depends on the rest of the application, I guess in most frameworks it would be 'module' + 'action'.
Seeing as nginx requests can be done by their URI, you could have a /write prefix to anything that handles user input, which wouldn't be cached? Especially on sites that don't have masses of user writing (in which case you'd use some sort of intermediary queueing daemon that nginx could shove things to, and could slowly feed the things into your backend).
Writing the /write prefix would immediately invalidate the cache for the changed pages and thus they could be easily regenerated a single time and placed into the cache until the next time (or perhaps the pages could simply be invalidated once every 10 seconds or so - very little has to be truly realtime).
Yes, this is what I thought he was saying, but I was thrown off by his 'I think this is entirely new' comment in the beginning. Pre-generating pages when the data changes is not new, we did it in 2000. Maybe I didn't fully understand and there is some aspect of it that is new (for example we used disk-based caches, not memory, but that doesn't really change the idea imo).
For 4 years I tried to run my WordPress site off of Apache + PHP.
For the first 3 years it was Apache+Prefork+mod_php+WordPress (default setup for any on RedHat-based or Debian-based setups for the longest time).
Any time I would get a story on Slashdot or Digg the site would die for at least half a day... god I hate it.
I went from a 1GB RAM VPS to a 12GB dedicated machine in 3.5 years trying to get it to stop crashing whenever I would get a flood of traffic and was never able to. I pushed back on the idea of being a Linux sys admin for so long because I didn't want that hassle, but alas, I had to bother with it.
Finally at about the year 4 mark I decided no REAL site on the planet was running with this configuration since it didn't seem to matter the hardware you threw at it (yes I tweaked the Apache setup/mod-list tirelessly to scale with the improved hardware). I finally started digging into how real human beings setup Apache and came across the argument for using the "MPM worker" as opposed to the default pre-fork worker.
Made sense to me; less 30+ MB processes running around answering questions.
After that change, it helped a little... I have no hard numbers on hand, but it felt like a small improvement.
I kept digging and soon ran across the one-hacky-but-now-officially-supported method of using Apache + a family of PHP VM threads pre-launched and called via FastCGI to execute the .php pages from my WordPress site, the computer-science part of my brain loved this idea... the Java-trained side of me suddenly made me realize that prior to this with Prefork and mod_php, every time someone was connecting I was spinning up a new Apache thread and a new PHP VM every single time (please correct this if wrong... this is how I understood it).
With FastCGI I could have a family of say 20-some PHP VM threads living in harmony and responding to Apache constantly asking them questions.
After rolling that change out at about year 4, I noticed a big improvement; maybe about 50%.
At my next Slashdotting the server got REALLY slow, but hung in there; no crashes. I thought it was odd that all that hardware still couldn't host things snappy... it seemed like every other day I was clicking a link off of Hacker News or Reddit front page to some dude's personal blog that was responding very quickly to me and I was positive these people weren't spending $300/mo like I was on dedicated hardware to run their blog.
So I kept digging.
As you guys probably know, when you start searching for what sucks about Apache two things come up more than any other: "use nginx" or "use lighttpd" -- I had read that early versions of lighttpd had some memory leak issues (I think long-since fixed) and had a handful of Ruby friends that loved nginx... so I decided to stay up all night one night and port the site over.
25mins later I was done.
Yea so that was a lot easier than I expected. The only painful part was using some heavy handed redirect logic to convert my WP-SuperCache rules over to nginx (the author wasn't supporting nginx yet, but I think he does not).
I would point out that the server load with nginx running with NO CACHING (WP-SuperCache disabled, all queries execute PHP and perform a MySQL query) was something like 1/4 what my Apache/MPM/FastCGI/PHP/WP-SuperCache-enabled setup was using.
Once I got WP-SuperCache up and running on nginx, the different was stupid-big. The nginx/WP-SuperCache setup was using 1/8th or 1/10th the system resources that the Apache setup had been using.
Overall I couldn't be happier with nginx. I think there are probably people that live in oxygen-rich test chambers inside of military bunkers who were bread to tweak Apache that can optimize it to have comparable performance, but that wasn't me. Out of the box nginx has been fantastic thus far.
And that is my little story related to this subject... for what it's worth.
> with Prefork and mod_php, every time someone was
> connecting I was spinning up a new Apache thread and a
> new PHP VM every single time (please correct this if
> wrong... this is how I understood it).
Since no one has yet... no, this isn't quite right. Apache in prefork mode maintains a pool of workers, each of which does contain essentially the full webserver and PHP system. These listen for requests and respond to 1 request per process per time. Once a process finishes responding to a request, it starts listening for a new one again. Apache tries to keep at least a few idle processes on hand all the time, so clients don't have to wait, and it will fork new processes to handle more simultaneous connections up to a configured limit. If you get a sudden spike in traffic which then dies down, Apache will start killing excess processes. Processes live until killed as unnecessary, or after serving N requests as configured.
That all sounds pretty reasonable, until you realize that every request, even for styleseets, scripts, images, favicon.ico or robots.txt, is served by one of those fat apache+php processes, which can easily reach 10MB of memory usage each (or a lot more sometimes, depending on your modules). And if Keep-Alive is enabled, or a client is a bit slow, or some part of the Internet gets dodgy, a process can get tied up for a long time, requiring Apache to start more, increasing your memory usage and leading to thrashing and/or hitting the max child processes limit.
Default max is 256 child processes, so easily enough to cause disk thrashing on a 1GB machine, and not enough to fully utilize a much larger machine.
Nginx can have workers, but the default number is one. nginx doesn't need a separate worker per time to handle each request. Anything that happens on a given connection is an event (connected, received some data, finished receiving data, got a response from the backend, closed the connection), and nginx just responds to events in the order received across all connections. It doesn't have to hang out and wait for the client to finish sending it's request, doesn't have to wait for PHP to process the request, just goes about handling the next event until those time-consuming things finish.
If you've looked at using PHP with nginx, you might be thinking of the fastcgi workers. Rather than including PHP in the web server process, fastcgi keeps a pool of PHP processes around, and whenever you get a PHP request, nginx passes it off to the fastcgi backend (then goes about other business until the backend responds). I don't know of any reason that PHP run this way should be faster than as an apache module, but the whole system ends up faster because the web server part is using a lot less memory and CPU. And depending on your site, there are some really low hanging optimizations you can do with nginx. E.g., just need a couple of configuration lines to cache all PHP GET requests for 5 min (unless you have cookie X, if you want). Since nginx seriously will not bat an eye over serving a redditing worth of static files, that means even your unoptimized word press site can stand up to the front page no problem. (Not verified from actually attempting it. But I did setup a simple PHP site on a 512 Linode that served 250k page views per day without excessive disk activity or the CPU usage crossing 10%.)
but so is apache + php + apc + memcached. easy to set up, and with minimal work to write a wrapper, you have a very effective web + caching. don't understand why PHP + APC + memcached gets such short shrift.
I got someone to install nginx for me on a not-so-busy server. Twice it filled up /tmp and ran out of inodes, crashing the server. I'm in for peak traffic next month and am afraid it will all come crashing down again :) Oh, and any help you can throw at me, it's appreciated.
Like others already said, that'll be php creating session files, not Nginx.
Sure it ran out of inodes and it isn't that /tmp is a ramdisk/tmpfs and only small? - you can change the session path in your php.ini by changing/setting session.save_path.
I'm going to guess PHP sessions. Off the top of my head its the only /tmp file creating thing I can think of in common web stacks.
If that is the problem mise is facing, you can swap out the storage that PHP uses for sessions (to use something other than files), I'd point out the relevant docs but I have a meeting to get to.
I'll look into this. My host Servint says that my inode allocation is "generous", so I wonder if I'm the only person running into this problem with PHP+nginx.
I'm pretty happy with nginx but I do mostly rails with that. For PHP, the general consensus seems to be fastcgi + lighttpd for high performance environments, but if you're just serving the same page, why not use apache + mod_cache. What improvements over mod_cache would wp-supercache get, you're just serving static files, so why not just add varnish or mod_cache for apache.
That said, serving small files from lighttpd is extremely fast. I set up one high-performance server for small files that served from ramdisk on a linux machine. At 15000 r/s with ab or siege I didn't max out the small instance on amazon. Sceptical about correctly measuring performance at that level though.
Lighttpd had a persistent bug where overloading FastCGI instance of PHP would eventually cause it to get "stuck" in a 500 error state, even after PHP had returned to idle.
My current setup involves nginx, wp-supercache, php-fpm, apc and mysql query caching. Plus an nginx rule that sets expires headers on all my static files.
It's taken me several years of learning, but now Wordpress doesn't run insanely slow.
Wow, I haven't run my own webserver in a while, but if what you say is true it's remarkable that Apache can be so bad.
If Apache really uses 7-9x more resources than nginx, that speaks more to Apache's awfulness than nginx's awesomeness. You can beat a worthy competitor by 50%, maybe even 100%, but if you're winning by 700% then your competitor is just bad.
> I would point out that the server load with nginx running with NO CACHING (WP-SuperCache disabled, all queries execute PHP and perform a MySQL query) was something like 1/4 what my Apache/MPM/FastCGI/PHP/WP-SuperCache-enabled setup was using.
This does not compute - no caching (the use of nginx is irrelevant) will always be slower than caching, especially supercache which creates static files for the server to serve directly - even apache can serve those faster than nginx+memcached of the article.
So I would have to say that his caching system under Apache wasn't working.
That would fit the numbers better.
It is true that nginx will perform better than a badly tuned Apache system in the event of a slashdotting but only because it alleviates the slow client problem ie.:
You have 400 concurrent visitors, each one takes up a thread on Apache causing it to saturate, whereas nginx will buffer the request from the client, send it all in one go to the backend, buffer up the response, freeing the backend for another request - an Apache thread would be unavailable for the entire time of that request.
However you can get the same behaviour just reverse proxying from nginx to Apache.
This is all moot though, you can handle even a slashdotting on a 5$/year vps - just make a static version of the page being hit, and serve it directly until the load dies down. 99.9% of requests will be for that page only.
He might have had keep-alives enabled. Apache + keep-alives + slow clients is a recipe for disaster. Seems everyone learns this the hard way. I learned it the hard way in 1998 when the Starr Report was released and my web farm was running Apache on Ultra 2's.
Amusingly, patio11's "Memory-Constrained VPS" have the same amount of memory (512 MBs) as those Ultra 2's I mentioned (I think, they may have actually only has 256 MB at the time).
The answer is to put something like Nginx in front of Apache. Let Nginx serve the static resources. Turn keep-alives off and let Apache serve the dynamic stuff. Also disable every module in a Apache that you don't know what it is doing and re-enable them as you understand them to make your site work.
Apache isn't that bad. I haven't used nginx, so I can't speak to it, but the reason I haven't is that I can push 500+ Mbps out of a single box running Apache with just some basic configs. mod_php may be slow, but Apache itself is plenty fast.
Interesting, but let's be honest for a second, in 99.99% of the cases, a classic Apache+mod_php will be well suited. You could count on your hands the number of sites that require a setup that needs to handle 12,000req/sec and more.
If you need to speed things up, reduce the memory footprint, i'd recommend using APC or eAccelerator, and put something like Varnish or Squid in front. As well as ditch stuff like WP or Drupal, might be nice and easy to use, but the code behind it is just a disaster if you want something fast and optimized.
Use extensions like XHProf or Xdebug to profile your code, and find the bottlenecks and memory eating functions.
Stop blaming Apache/PHP, they both have defaults, but in most of the case, the default is in the code.
I had created a post here regarding a Varnish plugin we wrote for cPanel over a weekend codeathon (http://www.unixy.net/varnish). Based on benchmarks, Varnish+Apache came out ahead of Litespeed. But benchmarks are not always conclusive since user traffic patterns are difficult to reproduce in a benchmark environments.
The feedback we've gotten so far on the plugin is very positive (google if interested). Nginx can do caching but not with the same flexibility as Varnish.
Varnish is good for caching any object; static or dynamic. The plugin we wrote uses Varnish to do a lot of dynamic page caching. Varnish is better at dynamic page caching than Nginx.
Litespeed has a caching mechanism too (introduced recently). But I think it requires an enterprise licesense of 2+ CPU. Here are some benchmarks of Apache+Varnish vs Litespeed: http://unixy.net/apache-vs-litespeed
He's using the nginx full-page cache mechanism, backed into memcached. This way nginx does not touch PHP at all. It goes straight to memcached, which keeps data in RAM, so it serves pages in 7 ms.
WP is trash. You can speed it up about 10 times with just a couple query fixes and removing the 100 extra calls to stripslashes you get when you pull a get var 100 times.
Just wanted to say thanks for the great comments on this topic. I'm just working on optimizing page speed for a client, and there's a wealth of suggestions on this page.
I'm now working on my first major webapp, so I've been learning how to be a sysadmin, pretty much from scratch (almost no previous experience with running webapps, lots of experience with other things though).
Anyway, I've been trying to find resources to help learn, but it's been extremely difficult. These are the questions that are still bugging me, hopefully someone here can point me to some good reading about them. Note that my stack is Python/Django over Apache (mod_wsgi) right now. Also note that my site has a "ping" system, where each client connected continuously pings the server every few seconds to see if any new info arrived, making me have to handle a larger amount of requests per second, I believe.
The questions:
1. What kind of load do I need to handle? Is 12,000 requests per second terrible/good/great performance? How do I go about figuring out how many people are online at once for most sites? How do I even estimate it?
2. How can I test the performance of my application? I've learned that Apache Bench is used a lot, but are there better tools?
3. What are the best tools to help me monitor and understand the load on my server?
4. How do I go about understanding the bottlenecks in my application? Right now, my Apache process is taking most of the cpu. What does that imply about where I should optimize?
Sorry to braindump, but I've been looking for answers to these questions online and haven't found any clear help.