Continuing on some performance improvements after last week’s downtime. Today, I implemented some long-needed changes to reduce the number of UPDATE
s happening on the database at any given moment.
Previously, every page load would immediately count the visitor and update the database. This worked perfectly fine when we were small, but now at normal traffic levels, and especially with spikes like the one we saw last week, this has become too much for the database to handle. Also since we use database replication, the issue has become visible to users, as (I believe) transactions pile up and things get out of sync between database servers.
With this change, many of those issues should go away. Some quick benchmarking showed that responses no longer pile up and gradually grind everything to a halt, as they would’ve before — even with high concurrency and sustained requests, the slowest response could be 600ms. In my tests, it seems now the application can handle at least three times as many concurrent visitors as it could before this change.
We generally see more traffic every day around 10-11am Eastern, so that should put this to the test tomorrow. But it’s looking good so far.
Thoughts? Discuss...