We have been having a lot of problems with our web systems at work lately. A couple of months ago we moved all of the servers to an outsourced hosting company with a much larger pipe to the Internet. Since that time, our traffic has grown at least 10-fold. While this growth is good in the long run, it is putting strain on systems not meant for that type of traffic. We planned in quite a bit of scalability into our systems, but have not been able to put in many of the upgrades due to personnel and political issues (mainly lack of approved monies). This weekend we will be putting in a new database server, running on linux instead of Windows and upgraded to Oracle 10g.
The migration process itself should be fairly smooth (keep your fingers crossed). Unfortunately we don’t have the time to plan out everything properly, but all the applications which connect to the database have only one connection string to change (or so the vendors tell us). Luckily, we still have the current database as a fall back if something goes awry.
Our next step is to look at caching at the XSLT and object layer on the application servers. Our Web Systems Manager was able to finish his testing and debugging of the XSLT caching, so we just have to deploy it – we’ll be waiting until after the database move to keep things clean.
Also of concern – which OS is going to be the best choice for the CDAs (content display application) or application servers? We were running Windows and that was ok until we migrated the servers. Right now we’re running Apple OSX servers, which seem to be doing fairly well. The problem is that the traffic increase was so drastic that we may have to throw a large chunk of hardware at it – including more application servers, probably increased to at least another complete cluster, which means another apache server out front and a load balancer to send the traffic to the right cluster.
Lots of different problems to attack all at once which is difficult even if having too much traffic is the type of problem most people want to have. My deepest concern is solving the problem(s) well enough that we don’t lose that traffic jump as it turns elsewhere when we can’t serve their requests.