One project consists of highly restricted video content, and we believe the demand will be low enough - dozens or fewer of simultaneous video consumers - that we can stream the content quite comfortably from ICPSR. (ICPSR shares a 1 Gb/s network pipe with one of the other centers at ISR, and the bit-rate of each video is about 700 Kb/s.) A follow-on project consists of less restricted video content that we believe will have broad appeal. A key question for the IT director is if the demand will be so high that it will exceed our capacity to deliver.
My colleagues are projecting that we will have peak simultaneous usage of 2000 video consumers. A little back of the envelope math (total consumers x 700Kb/s) makes it clear that our network pipe is too small; we'll need to move the content elsewhere for delivery, or split the load across several network locations to make delivery feasible. Unfortunately this collection is quite large - 20 TB - and so making lots of copies to spread the delivery across lots of locations will be expensive.
Another approach is to move the content into a content delivery network (CDN). In this scenario the CDN operator will charge us a fixed rate to store our content and a variable rate to stream our content. So how much will all this cost?
The storage is easy. We have 20 TB, and so we can calculate the storage costs quite easily.The streaming costs are more tricky, however. Typically one's costs are tied to the total number of bits streamed each month, but our only data point is the maximum number of total simultaneous video consumers. So how do we calculate the expected cost?
We've been struggling with this for a while, and I don't know that we've hit upon a good solution. But we do have A solution. Here it is....
What if we were to graph the number of concurrent video consumers? And what if we assume that the graph will be a curve, a Gaussian curve in particular?
Our X-axis can measure time of day where each point is a single second in a 24-hour period. And we'll choose the starting point and ending point so that the maximum height falls in the exact middle of the graph.
If we calculate the area under the curve this will tell us the total number of consumer-seconds, and we can then multiply that by 700 Kb/s to calculate the total number of kilobits streamed in a 24-hour period. And we can divide by 8 x 1024 x 1024 if we want to turn kilobits into gigabytes, a standard unit of measurement for calculating streaming costs.
To calculate the area under the curve we need to know the maximum height (2000) and we need to estimate how "fat" or "thin" our curve will be. (This is related to standard deviation in a normal distribution.) So if our X-axis is seconds, we might pick something like 60 (for a very pointed curve) or 3600 (for a flatter curve). And if we call the height 'a' and the width 'c' our formula for measuring the area (bits) is:
a x c x SQRT ( 2 x PI )
For example, if I think I'll have a maximum of 2000 simultaneous consumers (a = 2000), and I think my curve will be medium width (c = 1800), and my video is 700 Kb/s, and my price to stream is $0.25/GB, then my daily cost will be approx $188.