Scaling up is one thing. Scaling down can be an entirely different and complex matter. Conversation naturally leads us to talk about the limitless possibilities of the cloud but increasingly, we are seeing the need for the capability to scale back infrastructure, too.
There are a number of reasons for this that point to the requirements for treating data as a flowing stream that has significant business implications.
TicketLeap came into Comic-Con International after doing a test run that showed how important it was to have the ability to scale up. But when it came down to the opening day, something happened. The ticket demand was equivalent to selling out the Super Bowl in a few hours.
TicketLeap had things ready to go with its cluster on Amazon Web Services. According to the TicketLeap blog:
- Introduced an administrative “High Volume Mode” for events that allows TicketLeap to lock event data into cache. This includes data such as: event title/description/etc, ticket type title/description/etc. No need to fetch this data from the database unless needed.
- Optimized all queries throughout the checkout process reducing joins, reducing subqueries, and adding indexes where appropriate.
- Increased the `ulimit` on our webservers to handle more connections into our platform
- Increased the number of fastcgi processes spawned for our application
- Introduced a queueing system to throttle orders per second to manageable size.
- Ran extensive load testing on pure request load, pure checkout load, and a blend of request/checkout load. We used a tool called BrowserMob. It’s awesome and we highly recommend checking it out.
But the problem was not in scaling up. There were plenty of servers. In fact, there were too many servers. The servers could not handle the load due to nearly all the connections in the MySQL database getting tied up doing DNS resolution. TicketLeap scaled back the servers to open up lanes for the data to flow. This relieved the system and the tickets once again were being sold without the headache of being super slow. It turned out to be an error that in many ways TicketLeap could not control. Its goes into detail on its site about the issue.
The sheer amount of data had to be controlled in order for Comi-Con to sell its tickets. The economic impact could have been considerable if the system broke down. Comi-Con needed the broad capabilities of the cloud but also the ability to scale back so the client could be served.
There are plenty of lessons here. It takes careful planning to determine how an information architecture will function under such demands. If not careful, the data can cause a jam that has serious business implications.