Friday, January 15, 2010

Virtual Software Worker Bees -- Monte Carlo Methods

I had this problem that I had to solve. I knew that 5,000 people were going to access a database once a day over a month. The access time was from around 8:00 AM to about 9:00 PM. Every time they accessed it, I had a software worker bee ready to do some work in the database.
Software worker bee -- not exactly as illustrated.

So the big question was, how many people would access the database simultaneously. How many virtual worker bees would I need to handle all of the requests? It's not an easy question to answer.

I had to make some basic assumptions. For example, all 5,000 requests would not come evenly spaced throughout the day. So the question boiled down to the following: Out of the 5,000, how many would try to access the database simultaneously.

A single request takes about 45 seconds to process. But to be safe, I could assume that a request takes a full minute to handle. If all of the requests came one after another, I could handle 60 requests with just one worker bee. But life doesn't work that way. I had to assume that people work during the day, and most of the requests would come when they were not working, or otherwise occupied. I also had to assume that there was either 2,3 or 4 busiest hours of the day where 75,80 or 90% of the requests would have to be handled. So how do you solve a problem like this?

The answer is with "Monte Carlo" methods. Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are useful for modeling phenomena with significant uncertainty in inputs. I wrote a computer model with the following screen:





So, what I would do is various simulations. In the first one, I would assume that 90% of my requests would come in the three busiest hours of the day. It didn't matter which three, because it was just an assumption. And I knew that since a request took 60 seconds, I had each request competing for 60 slots in the hour. I knew that I would have 4,500 requests over three hours, or 1500 requests an hour, each competing for 60 time slots. Dividing 1500 by 60 gives us 25 simultaneous requests, but they do not come evenly over the hour.

So I generate a random number between 1 and 60. This represents which minute the request comes in. I do this 1500 times, and then I do this for three hours, and then I do this every day of the year -- 365 days. Then I get an average and compute the variance and standard deviation.

Then I repeat the experiment for 80% of the requests come over 4 hours every day over a year. Then I rework the parameters such I assume that 75% of the requests come over two hours. I want to get a worst case scenario so that I will have enough worker bees.

Well, after it was all said and done, my computer models predicted that I would need the following number of virtual worker bees to handle 5,000 requests per day and process all of the simultaneous requests:

35

No comments:

Post a Comment