Monday, April 16, 2012

The nature of ICPSR's holdings

At the end of 2011 ICPSR had about 9TB of content stored in Archival Storage.  This measurement includes everything we have collected over the past 50 years, including content which is not packaged into "studies" for dissemination, such as TIGER/Line files and data packaged for SDA.  This content is not compressed, and contains many duplicates[1], and so should be considered an upper bound.

As we head into the start of Q2 in 2012 the quantity of content in Archival Storage has edged up just a little bit; it may be as much as 9.1 TB now.  And I would guess that we have another 100GB or so of content in Ingest storage, making its way through the ICPSR data curation process.

The big news, though, is the amount of non-survey content one finds in Ingest storage:  7.4TB.  And growing.  Fast.

As video content from the Bill and Melinda Gates Foundation Measures of Effective Teaching project continues to arrive it won't be much longer before the amount of video content equals the amount of survey data content.  By the end of the calendar year I expect that we will have more video than survey data.

Long-time ICPSR staff tell the story of how the 2000 Census doubled the size of ICPSR's holdings.  (I'll speculate that perhaps ICPSR went from about 3TB of content prior to the 200 Census, and then grew to 6TB thereafter.)  In 2012-2013 ICPSR is likely to quadruple the size of its holdings, growing from about 9TB to nearly 40TB.

