Wednesday, May 30, 2012

FLAME, no, not that one

Asmat Noori, who leads the IT Operations team at ICPSR, pointed me to this the other day:

Unlike our FLAME, which we hope will be a benevolent collection of built and borrowed systems, this:

Flame appears poised to go down in history as the third major cyber weapon uncovered after Stuxnet and its data-stealing cousin Duqu, named after the Star Wars villain.

So I think it wins the award for the Dangerous-but-kind-of-cool Flame.

Monday, May 28, 2012

Seven tips to survive Going Google

Are you Going Google on your campus?

Michigan is going Google in 2012.  A few of us migrated from existing IT systems at the University of Michigan in January, and so have been living the Google dream, but needing to still work with our colleagues who are still using legacy systems for email, documents, calendars, etc.  I've seen some of the complaints people have had as they have moved to Google, and I can also see some of the future problems ahead.  You can save yourself and your colleagues hours of frustration by following a few simple rules.  And so I present the tech@icpsr survival guide to Going Google.

One, stop organizing your email.  You don't need to spend your time that way any longer.  The only reason you  needed to do that in the old world was that you had a low quota for storage (and so kept moving folders off of the mail server and on to local storage) and you have a bad search.  Now you have plenty of storage and a great search.

Two, stop asking people when they are available via email.  Look at the shared calendar.  If you cannot see the person's calendar, tell them to fix the access controls.  And if they won't, then make them schedule the meeting instead.

Three, never, ever download a Google Doc and start editing it in Microsoft Office.  Once you move the document out of Google Docs and into Office you break sharing, introduce odd formatting, make it difficult or impossible to fold the changes back into Google Docs, and commit other crimes against documents.

Four, do not send documents as attachments.  Make a Google Doc.  Share it with your collaborators or readers.  Do not fill up their Gmail allocation with your documents.

Five, use Chat for the quick stuff.  Got a quick question?  Need a real-time response?  Stop using email. Got a long question?  Do not need a real-time response. Use email.  Long chats are just as bad as 4-minute voice mails.

Six, stop doing THAT in email.  If you find yourself encountering barrier after barrier trying to execute a business process via Gmail, it is likely that email is simply the wrong solution.  Need a shared archive of email?  Use a Google Group.  Need a help desk, ticketing, or request system?  Use Footprints or JIRA or any one of many open source or hosted solutions.  Need a place to share and edit a catalog of information?  Use a Google Site.  Many of the Gmail-related headaches I've seen on campus are caused when people are trying to use email as a substitute for a more complex business process.

Seven, get a personnel email address NOW.  My experience is that it is always risky to rely upon an employer or a telecom to supply your email.  People who were using their email address for personnel use and using their for work are now in a pinch at UMich.  The address is necessarily becoming the one for work use, and they are now scrambling.  Don't wait, go get a personnel Gmail or Yahoo or Hotmail or other email address and mail account today.

Friday, May 25, 2012

After the zombie apocalypse

Came across this in a post on Boing Boing:

 I would describe this as mixing the darkness (sound, tone, background) of some of Tim Burton's movies with a TV procedural like CSI or Law and Order with a George Romero zombie flick. Pretty nicely done 17-minute short.

From the Vimeo site:

The zombie apocalypse happened -- and we won.
But though society has recovered, the threat of infection is always there -- and Los Angeles coroner Tommy Rossman is the man they call when things go wrong.

Wednesday, May 23, 2012

How many bits of video will I stream?

We have a copy of video preservation and access projects for the Bill and Melinda Gates Foundation.

One project consists of highly restricted video content, and we believe the demand will be low enough - dozens or fewer of simultaneous video consumers - that we can stream the content quite comfortably from ICPSR.  (ICPSR shares a 1 Gb/s network pipe with one of the other centers at ISR, and the bit-rate of each video is about 700 Kb/s.) A follow-on project consists of less restricted video content that we believe will have broad appeal.  A key question for the IT director is if the demand will be so high that it will exceed our capacity to deliver.

My colleagues are projecting that we will have peak simultaneous usage of 2000 video consumers.  A little back of the envelope math (total consumers x 700Kb/s) makes it clear that our network pipe is too small; we'll need to move the content elsewhere for delivery, or split the load across several network locations to make delivery feasible.  Unfortunately this collection is quite large - 20 TB - and so making lots of copies to spread the delivery across lots of locations will be expensive.

Another approach is to move the content into a content delivery network (CDN).  In this scenario the CDN operator will charge us a fixed rate to store our content and a variable rate to stream our content.  So how much will all this cost?

The storage is easy.  We have 20 TB, and so we can calculate the storage costs quite easily.The streaming costs are more tricky, however.  Typically one's costs are tied to the total number of bits streamed each month, but our only data point is the maximum number of total simultaneous video consumers.  So how do we calculate the expected cost?

We've been struggling with this for a while, and I don't know that we've hit upon a good solution.  But we do have A solution.  Here it is....

What if we were to graph the number of concurrent video consumers?  And what if we assume that the graph will be a curve, a Gaussian curve in particular?

Source: NIST
Our Y-axis can measure the total number of simultaneous video consumers at a given point in time.  We have our maximum height value (2000) as one data point.

Our X-axis can measure time of day where each point is a single second in a 24-hour period.  And we'll choose the starting point and ending point so that the maximum height falls in the exact middle of the graph.

If we calculate the area under the curve this will tell us the total number of consumer-seconds, and we can then multiply that by 700 Kb/s to calculate the total number of kilobits streamed in a 24-hour period.  And we can divide by 8 x 1024 x 1024 if we want to turn kilobits into gigabytes, a standard unit of measurement for calculating streaming costs.

To calculate the area under the curve we need to know the maximum height (2000) and we need to estimate how "fat" or "thin" our curve will be.  (This is related to standard deviation in a normal distribution.)  So if our X-axis is seconds, we might pick something like 60 (for a very pointed curve) or 3600 (for a flatter curve). And if we call the height 'a' and the width 'c' our formula for measuring the area (bits) is:

a x c x SQRT ( 2 x PI )

We can then use fixed rates to turn number of consumers into number of GBs.  And if we have a per-GB price, we can turn that into a total daily cost.  I made a little calculator at Zoho to help with this.  (Note that you must be sure to use the Tab key to move through the form.  Hitting the Enter key or using the Submit button stores the information in a throw-away table at Zoho and clears the form. )

For example, if I think I'll have a maximum of 2000 simultaneous consumers (a = 2000), and I think my curve will be medium width (c = 1800), and my video is 700 Kb/s, and my price to stream is $0.25/GB, then my daily cost will be approx $188.

Friday, May 18, 2012

Data Fetish

David Rosenthal has another good post on digital preservation where he quotes a gent who has coined the term "data fetish" for those that think that data should never be thrown away.  Well worth reading, but I'll quote one piece here:
So, we're going to have to throw stuff away. Even if we believe keeping stuff is really cheap, its still too expensive. The bad news is that deciding what to keep and what to throw away isn't free either.
This is so very true.

When my team inherited the responsibility for Archival Storage at ICPSR in 2005-2006, we made a choice to throw away a ton of stuff.  Paper hardcopies of electronic files.  Magnetic tapes that had been read successfully  to spinning disk.  Old office supplies.  Leases to storage lockers. .... After the smoke cleared we had about 5TB of content moved from a large array of tape types to spinning disk.  And this formed the core of ICPSR's Archival Storage.

I think a key question for every archive to answer is "Why are we keeping this?"

If the answer is "We're not sure" (even with some oblique wording), then it is time for that item to be discarded.

Wednesday, May 16, 2012

Job posting: Cloud Sourcing Manager (UMich)

The University of Michigan has posted a cloud-oriented position.  This is good news.  (The entire position is included in-line at the end of this post; the U-M job system does not retain job descriptions once the posting date has expired.)  However, based on my reading of the job description, it has a couple of serious flaws.

One, this position is funded for a single year.  Why?  Is this because the cloud is a passing fad?  Or because this position will be able to achieve complete success in a single year?  Why wouldn't this position be funded indefinitely?

[ Edit - I chatted with the hiring manager - Bill Wrobleski @ ITS - and he confirmed that the job actually has funding for at least two years.  This is very, very good news, and shows a much bigger commitment.  I would also highly recommend Bill as a colleague. ]

Two, the job position itself sends mixed messages.  For example, from the Job Summary:
This solution oriented, flexible and proactive manager will be responsible for working with key process stakeholders such as Procurement Services, Office of General Counsel and IT governance groups to establish effective, secure and sustainable cloud sourcing processes for IT-related activities.
OK, so this is going to be a pretty slow-moving effort, working with some of the most careful, cautious organizations.  Work will be in baby steps.  Right?  However, look at the Responsibilities:
Accelerate the adoption of cloud solutions by removing barriers, streamlining processes and supporting the move to cloud solutions.
Creative problem solving and strong strategic thinking skills.  
So work with lots of stakeholders AND accelerate the adoption of cloud solutions? How? (And this second thing looks more like a Required Qualification than a Responsibility.)

So in the spirit of offering constructive criticism, here is my unsolicited, unofficial, revised Job Summary:
Information and Technology Service (ITS) is seeking an experienced Cloud Sourcing Manager to promote a "cloud first" culture at the University of Michigan. This person will be responsible for making it easier to purchase and use enterprise-class cloud services (e.g., Amazon Web Services, by collaborating with U-M Procurement Services, Office of General Counsel and IT governance groups. This person will also be responsible for lowering barriers to using consumer-class cloud services (e.g., Dropbox, Lastpass) by conducting regular demonstrations and informational sessions (town halls, brown bags) to bring awareness of cloud resources to the entire U-M community. 
NOTE: this position is funded indefinitely.
So now the person has the opportunity to accelerate cloud use (where it makes sense) while s/he works through the bureaucracy of the U-M to make bigger, systemic changes.  I like this more already!

And now for my unsolicited, unofficial, revised Responsibilities.  First, I should note that I like all of these ones from the original posting:
  • Maintain a broad understanding of cloud marketplace including Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) offerings. 
  • Manage relationships with cloud vendors including establishing service level standards and reporting. 
  • Participate on and/or lead project teams that are leveraging new cloud-based products. 
  • With appropriate stakeholders, support contract negotiation with cloud providers. 
  • Track the use of cloud services and identify trends and investment opportunities. 
  • Work cooperatively with higher education consortiums [sic] to provide and leverage expertise and combined purchasing power.
Some in the original posting are OK too, but strike me as details of the position (i.e., you need to work with other teams in ITS; and you should make sure that the cloud stuff can work with our existing IT infrastructure). I would not include them any more than I would include responsibilities like attending meetings, reading email, and brushing teeth.

And here are the items I would add.  Mine are really specific:
  • Negotiate an invoicing mechanism with Amazon Web Services (AWS) such that the U-M receives preferential rates based on aggregate campus usage, not individual department usage
  • Using AWS Direct Connect establish a private, secure, low-cost network connection between the U-M campus and the AWS cloud
  • Using AWS Virtual Private Cloud establish a private, secure, virtual data center in the AWS cloud for use by researchers and IT staff at the U-M
  • Organize a campus-wide "Cloud Day" event at U-M similar to events focused on IT security and cyberinfrastructure, where speakers would present information about real-life use-cases using cloud capabilities to solve real problems
  • At least once per month conduct an informational session for some department, institute or similar organization on campus, demonstrating a consumer-grade cloud product, noting its pros and cons, and how researchers and staff may use it safely
I should note that the AWS-related items would also help me out a lot.

And as promised, the original posting:

Cloud Sourcing Manager

How to Apply

A cover letter and resume are required; the cover letter must be PAGE 1 of your resume. The letter should:

(1) specifically outline the reasons for your interest in the position;
(2) outline your particular skills and experience that directly relate to this position; and
(3) include your current or ending salary.

Starting salary may vary depending on qualifications and experience of the selected candidate.

Job Summary

Information and Technology Service (ITS) is seeking an experienced Cloud Sourcing Manager to oversee cloud computing activities at the University of Michigan. This solution oriented, flexible and proactive manager will be responsible for working with key process stakeholders such as Procurement Services, Office of General Counsel and IT governance groups to establish effective, secure and sustainable cloud sourcing processes for IT-related activities.

NOTE: this position is funded for 1-year.


*Maintain a broad understanding of cloud marketplace including Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) offerings.
*Manage relationships with cloud vendors including establishing service level standards and reporting.
*Participate on and/or lead project teams that are leveraging new cloud-based products.
*With appropriate stakeholders, support contract negotiation with cloud providers.
*Serve as a single-point-of-contact for cloud-related issues for the campus community.
*Oversee the creation and maintenance of business processes that support the acquisition, operation and retirement of cloud services such as templates and instructional material.
*Work with Information and Infrastructure Assurance (IIA) to encourage and ensure the use of cloud services follow appropriate security, compliance and privacy standards.
*Accelerate the adoption of cloud solutions by removing barriers, streamlining processes and supporting the move to cloud solutions.
*Work with architecture and technical groups to establish technical integration support to allow cloud services to leverage standard directory, authentication and other interfaces.
*Track the use of cloud services and identify trends and investment opportunities.
*Work cooperatively with higher education consortiums to provide and leverage expertise and combined purchasing power.
*Demonstrate effective staff leadership and development.
*Creative problem solving and strong strategic thinking skills.

Required Qualifications*

*Bachelor's degree in Computer Science, business or related field or an equivalent combination of education and experience.
*Extensive experience with IaaS, PaaS and SaaS services and implementations.
*Broad knowledge of cloud computing marketplace.
*8 years of progressive IT managerial experience.
*Proven project management experience to meet customer expectations and mitigate risk.
*Superb verbal and written communication skills.
*Demonstrated supervisory experience to include: recruiting, mentoring, staff development, performance management, leadership, and/or team building.
*Ability to interact successfully with a wide range of people including faculty, executive leadership, technical staff and other campus groups.
*Strong organizational skills and the ability to successfully complete multiple tasks within established and changing deadlines.

Desired Qualifications*

*Masters Degree in business, technology or related field.
*Experience leading IT efforts in a higher education institution.
*Significant experience managing vendor relationships.
*Experience negotiating contracts and managing ongoing service levels.

Additional Information

The University of Michigan was featured as one of the "Great Colleges to Work For" in the 2011 Chronicle of Higher Education.

U-M EEO/AA Statement

The University of Michigan is an equal opportunity/affirmative action employer.

Monday, May 14, 2012

Why ask why? (Seth Godin)

Seth's Blog typically has short posts, but provocative and interesting messages behind them.  I thought this one was particularly good, and I'll include just a couple of his here (click the link to read 'em all):

  • Why is that our goal?
  • Why is this our policy?
  • Why are we having this meeting?

And I'll add a few more of my own particular to ICPSR:

  • Why do we count downloads?
  • Why is this the way we curate data and documentation?
  • Why do we conduct a disclosure reviews?
  • Why are grant budgets very detailed and specific, but the deliverables are completely flexible?

I think "Why?" is a great question.

Friday, May 11, 2012

Zombies as a pedagogical tool

This is what we should be doing with the OLC:

Analyze this dataset to find out who is the most likely to contract the virus.

Use GIS to plot a path to safety, bypassing the areas most likely to be zombiefied.

And maybe some funding via Kickstarter too boot?

Monday, May 7, 2012

OpenSSL FIPS 140-2 and RHEL 5

Since I often use the Internet to find guidance and answers to questions, I thought I'd add my own small contribution back to the community.

We found ourselves needing to build a FIPS 140-2 compliant version of the OpenSSL openssl command-line utility recently.  A good starting point is the OpenSSL FIPS 140-2 User Guide.  It contains instructions for where to find OpenSSL source, and importantly, instructions for verifying the integrity of the distribution, and this is a necessary component of building a FIPS 140-2 openssl.

I worked through the guide through nearly page 23.  However, when I reached section 4.2.1, things started to go wrong for me.  I was able to run config with no problem, but the make failed with an error about not having a target for fipscannister.o.

I then found a very helpful bit of advice in a Google Group post about OpenSSL and FIPS 140-2.  But that advice didn't quite match our environment (it was Ubuntu, and we run RHEL).  So here are my directions for building an openssl utility on RHEL 5.

First, in our case we downloaded and verified the integrity of openss-1.2.3 using the directions from the guide above.  Here are the results in a little sandbox:

batch-bryan:; pwd
batch-bryan:; ls -R
lib/  src/



Now, we head into the src directory, and unpackage the tarball:

batch-bryan:; cd src
batch-bryan:; tar xf openssl-fips-1.2.3.tar

And then configure things to build the FIPS canister and utility:

batch-bryan:; cd openssl-fips-1.2.3
batch-bryan:; ./config fipscanisterbuild --prefix="/tmp/openssl-fips"

Now to build and install:

batch-bryan:; make
batch-bryan:; make install

And there it is:

batch-bryan:; ls /tmp/openssl-fips/lib
engines/             fips_premain.c
fipscanister.o       fips_premain.c.sha1**
fipscanister.o.sha1  libcrypto.a          libssl.a             pkgconfig/

And there is the utility:

batch-bryan:; cd /tmp/openssl-fips/bin
batch-bryan:; ls
c_rehash*  openssl*
batch-bryan:; setenv OPENSSL_FIPS 1
batch-bryan:; ./openssl version
OpenSSL FIPS Object Module v1.2

But if we try the stock one:

batch-bryan:; /usr/bin/openssl version
13789:error:2D06C06E:FIPS routines:FIPS_mode_set:fingerprint does not match:fips.c:493:
batch-bryan:; unsetenv OPENSSL_FIPS
batch-bryan:; /usr/bin/openssl version
OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008

Not so much.

To build and install the FIPS 140-2 compliant version of OpenSSL in a more "real" location than a sandbox, just change the value used in the prefix variable used in the config invocation above.

Friday, May 4, 2012

April 2012 deposits at ICPSR

Chart?  Chart.

# of files# of depositsFile format
22text/plain; charset=unknown
37135text/plain; charset=us-ascii
1046text/plain; charset=utf-8

Things look pretty light in terms of the usual formats, but a handful of big, big deposits stick out.

DICOM - Digital Imaging and Communications in Medicine - is a format used for medical imaging, and we received well over 150k such files last month plus an equally large batch in the more prosaic JPEG format.  I'm not sure how we'll be curating and delivering this content.....

Related to this same set of deposits is a large number of Windows executable files (EXE and DLL file extensions) which will also be an interesting challenge for delivery.

Wednesday, May 2, 2012

April 2012 Web availability

April was a much, much better month for our systems:

Click to enlarge
The real game changer seemed to be disabling the transparent hugepage system on our RHEL 6 systems.  Once we did that, our fortunes changed for the better.  And so we sing:

The main culprits behind the small amount of downtime we had in April were a misfire during an attempt to introduce yet another rewrite rule to our Apache httpd config (which is always risky), a filesystem filling up on the production web server which tanked the search engine for nearly thirty minutes, and a brief outage with Janrain's Engage service, which allows people to use their Facebook ID or Google ID to sign in to the ICPSR web site.