Friday, July 19, 2013

Clients v. customers | services v. products

Seth Godin has another excellent post.  This one notes the distinction between customers (who decide whether or not to buy your product) and clients (who pay you to make things for them).
Seth Godin
In the context of ICPSR I think we have a product we call "ICPSR membership."  Customers buy it (or not), and if they do, they receive a reasonably well defined set of services, largely centered around the ability to access high quality datasets and documentation.  We have many hundreds of customers for this product.  I think our Summer Program is also a product, and that too has may hundreds of customers.

We also have a smaller number, perhaps a dozen or so, of clients.  In the best case we have a handful of clients who all pay us to perform a similar set of tasks for them: curate their datasets and documentation, preserve the curated artifacts, and publish the content on a specially "skinned" version of the ICPSR web site for all the world to see.  Adding more clients who want us to do this kind of work benefits all of the other clients, and, often, our customers too.

And like any organization which draws much of its revenue from contract work for clients, we also have those that push us in new, different directions, sometimes for the better, and sometimes for the worse.  The trick, of course, is not too try to head off in too many different directions at once.  And to favor those clients who pull us in better, not worse, directions.


Wednesday, July 17, 2013

ICPSR Web Availability - 2012-2013

Here are the final numbers for ICPSR's web site availability over our last fiscal year:

Click to embiggen
The year did not start off so well, and we reached the nadir quickly.  August 2012 was our worst period of availability in a very bad year for us overall.  January, March, and June 2012 also had very poor numbers.

The main antagonist we faced was a new and unusual problem with our Oracle database server.  For many years we would export the content for backup purposes each evening, and it worked well for a decade.  However, suddenly in 2012 we began to experience an outage just AFTER each export.  Despite intensive analysis by ourselves and local Oracle exports, we never could isolate the root cause of failure.

We eventually "solved" the problem by exporting our database only once per week v. once per day.  That left us more exposed to loss, of course, but it seemed to limit the outages to once per week v. once per day.

We then replaced the hardware with a new machine with a bit more processor and memory, but with blindingly fast solid-state drives. With the new machine deployed we returned to our daily export schedule, and the machine -- and our web availability -- have been in pretty good shape ever since. The machine went into service in April 2012, and the chart above makes it clear that life has been a little less hectic for our on-call engineer since then.

Friday, June 21, 2013

A tiny wishlist for Amazon Web Services' Route 53

We've been using the DNS hosting service, Route 53, from Amazon Web Services (AWS). The default port for a DNS server is UDP (and TCP) 53, and I've always presumed that this was the answer to the question:  Why did Amazon name its DNS service Route 53?

In general I like the Route 53 service pretty well.  It's smart how the DNS servers listed for a Hosted Zone (the term AWS uses for a domain hosted in Route 53) reside in different top-level domains, like ORG, NET, COM, and even CO.UK. The UI in the AWS Management Console is fine for managing small zones that contain just a handful of records.

There's one feature that I wish Route 53 had, though, and it would be particularly useful, I think, to research organizations in higher education.

In our grants and contracts there is often a commitment to build, deploy, and operate some technology deliverable.  Often the technology is a web portal of some sort, and the investigator is keen to register a new domain.  This leads to an initial registration of something like:

WhizBangProject.org

The domain may have only the smallest number of records:  an SOA and NS records, of course, and then perhaps an MX record routing mail to a central server, and an A record pointing to the IPv4 address of the web portal.

Soon, though, the researcher may decide to register the same name in different top-level domains, and we have:

WhizBangProject.net
WhizBangProject.com
WhizBangProject.info

joining the mix.  These domains have EXACTLY the same records as the first one, and so if one is running his/her own DNS service, one can configure the DNS server to use the same zone file when loading all of the domains.  This is nice - one file with one set of records to manage for many different domains.

However, it is often the case that the investigator discovers that the original name is not satisfactory, and so we then register an alternate name in several domains:

CoolBeansResearch.org
CoolBeansResearch.net
CoolBeansResearch.com
CoolBeansResearch.info

and maybe a slight variant too:

Cool-Beans-Research.org
Cool-Beans-Research.net
Cool-Beans-Research.com
Cool-Beans-Research.info

In a world where one runs one's own DNS server, the additional domains are not much extra work.  Like the original solution where we pointed the new domains at the same zone file, we can just point these new domains at that same zone file.

I wish Route 53 would let me create a collection of what they call a Record Set, and then apply those same records to an arbitrary set of what they call Hosted Zones.  If the SOA and NS Record Sets were unique to each Hosted Zone, that would be OK; it is really the other records - the ones we add ourselves in Route 53 - that we would want to share across all of the Hosted Zones.

Wednesday, June 19, 2013

EMC transfer_support_materials fix for anonymous ftp

Last month I posted about an issue we have been having with our EMC NS 120 NAS.  To re-cap briefly...  When the NS 120 discovers a problem, one action it often will take is to collect up a bunch of diagnostic information, Zip it up, and then use anonymous ftp to transfer it to EMC.  A shell script under the /nas/tools directory called transfer_support_materials does the dirty work. The problem we have been experiencing is rooted in this script; it would fault when trying to transfer the Zip file.

The sequence of ftp commands inside the script is simple:

  1. Connect to ftp.emc.com
  2. Log in using the user name anonymous and a password unique to the NS 120
  3. change directory to /incoming/APMxxxxxxxxxxx (where the string of x's is replaced with the NS 120 serial number)
  4. transfer the Zip file
The script would always fail at step #3 with the message: File unavailable.

The root of the problem is that the transfer_support_materials script expects the directory to exist, but it doesn't.

At first I thought that the problem was with the EMC anonymous ftp server.  I opened several SRs trying to get someone to create the directory.  None of the SRs ever reached a satisfactory closure, and I was left with the impression, Of course the directory doesn't exist; we delete them after a couple of days automatically.

So.....  The tool to transfer diagnostics expects the directory to exist, and the business process at EMC deletes the directory as a routine matter.

At the suggestion of one of my colleagues, I ran ftp by hand, and discovered that it would happily let me create the directory.  That is, I could manually do this:
  1. Connect to ftp.emc.com
  2. Log in using the same credentials as the NS 120
  3. mkdir /incoming/APMxxxxxxxxxx
  4. cd /incoming/APMxxxxxxxxxx
  5. transfer the Zip file
I decided to tweak transfer_support_materials, adding this new element to the existing sequence of ftp commands.  The change is really simple.  This:


#do transfer
LFTPCOMMANDFILE="open -u ${username},${password} $HostName;cd $remote_name;rm -f ${newfile##*/};put $newfile"


becomes this:

#do transfer
LFTPCOMMANDFILE="open -u ${username},${password} $HostName;mkdir $remote_name;cd $remote_name;rm -f ${newfile##*/};put $newfile"


Ran a quick test of the script after this change, and Voila!, it works again.

Wednesday, May 1, 2013

EMC anonymous ftp service and transfer_support_materials

I have not seen notes about this in forums and boards, and so thought I would pass this along to others who may be using EMC gear.

About a month ago we had a small problem with one of our NS 120 Celerra NAS units.  (It may have been soft errors on one of its disk drives.)  The Celerra detected the problem, and went to do its usual thing:  collect logs and other analytics, and then copy them to EMC's anonymous ftp site.  Our Celerra uses a utility under /nas/tools called /nas/tools/transfer_support_materials to do this. We noticed that when the Celerra tried to transfer the support materials that too failed.  And this generated an additional series of critical errors.

We logged into the Celerra's control station and ran transfer_support_materials by hand.  And we saw a message like this:

[nasadmin@controller tools]$ /nas/tools/transfer_support_materials -uploadlog
transfer_support_materials[12057]: The transfer script has started.
PING ftp.emc.com (168.159.219.138) 56(84) bytes of data.
From 12.249.233.6 icmp_seq=0 Packet filtered

--- ftp.emc.com ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
, pipe 2
cd: Access failed: 550 Requested action not taken. File unavailable. (/incoming/APM00000000000)
`/nas/var/emcsupport/support_materials_APM00000000000.130407_1351.zip' at 65536 (0%) 49.1K/s eta:5m [Connection idle]

I've replaced our Celerra's serial number with the string "00000000000".

We then ran ftp by hand to see if we could replicate the error:

nasadmin@controller tools]$ ftp ftp.emc.com
Connected to ftp.emc.com.
220-Proceeding further constitutes acknowledgement
to EMC Acceptable Use and Customer Security policies.
Anonymous uploads are immediately moved to a secure server accessible only
within EMC networks.
File downloads from ftp.emc.com are restricted to selected /pub directories, via
temporary secure accounts or via specific permanent secure accounts only.
Anonymous users please login with anonymous and email address as your password
See Powerlink emc278739 for upload instructions.
EMC staff: please refer to current services, FAQ and Best Practices documents at
http://one.emc.com/clearspace/community/active/css/projects/ftp-service
Please email all questions and concerns to ftpquestions@emc.com
220 Please reference the FTP Acceptable Use policy: http://itcentral.corp.emc.com/Policies/AcceptableUse.pdf
534 Command denied.
534 Command denied.
KERBEROS_V4 rejected as an authentication type
Name (ftp.emc.com:nasadmin): anonymous
331 User name okay, need password.
Password:
230 User logged in, proceed.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd /incoming/APM00000000000
550 Requested action not taken. File unavailable.
ftp>

So, the problem was that the directory that holds our support materials (/incoming/APM<serialnum>) was missing or had its mode set to something that disallowed access.

We contacted EMC, and some days later they confirmed that the problem was indeed that the directory was missing, and that they had recreated it.  We then ran ftp by hand to confirm that everything was working again, and it was. That was good news, but when we tried the same thing on our second NS 120 Celerra, we discovered that it too was missing its "support directory" on the ftp server.  So we added that trouble report to our service request, and some days later, EMC confirmed that too had been missing, and then again recreated it.  In speaking with EMC it is a bit unclear if this problem is particular to us or more broad.

The upshot of the story is that if you too run a Celerra or other product that sends support materials to EMC via anonymous ftp, this might be a good day to test out transfer_support_materials to make sure that your "support directory" is intact.  If so, that's great, but if it is missing, you may want to open a service request with EMC soon so that they can recreate the directory for you.  Better to have it in place before your system needs to send support materials, but is not able to do so.

I should note that we're still happy overall with EMC; in fact, we've just purchased the first three nodes of a new Isilon storage system from them.  So the intent here isn't to excoriate them over the missing ftp directory; it was easy to reproduce the problem and to correct it.  But we did wish that we had been able to learn about the problem prior to the disk failure so that it could have been corrected earlier, not when the Celerra was trying to report a disk failure.
 


Monday, April 29, 2013

ICPSR launches Measures of Effective Teaching web site

Some of my colleagues, including ICPSR Director George Alter, gave a demo of one of our newest Web sites and collections at the American Educational Research Association 2013 annual meeting on Sunday.
MET LDB web site
Click the image to navigate to the live site
My team has built the video portal portion of the system.  The portal enables a researcher to play a list of videos that s/he has selected to view based on an analysis of the associated quantitative data and tagging data.  Access to the video and datasets is restricted and requires one to complete a data use agreement via ICPSR's web-based request system.

We're grateful for the support we've received from the Bill and Melinda Gates Foundation to make all of this possible.

Monday, April 22, 2013

Qualys browser checker

Ever since the recent craziness with vulnerabilities in Java plugins, I've making a concious effort to use Qualys's browser checker - https://browsercheck.qualys.com/ - on a routine basis both at home and at the office.

Installing the tool in your browser is very easy, and the service is free and painless to use.  I have been using it to both to determine if my current browser and plugins are up to date, and also to identify plugins that are installed and enabled, but which I don't really need or use (e.g., Silverlight which I often disable for long stretches at a time).

Qualys generates a nice report



like the one above to let you know if everything is up-to-date.

Thursday, April 18, 2013

Web availability at ICPSR - March 2013

ICPSR's content delivery system showed very high availability in March 2013:  a bit over 99.95% uptime.  We had only two problems in March.  One was a power outage that affected our headquarters on the University of Michigan campus, and we experienced a small amount of downtime as we moved service to our replica in Amazon's cloud.  The second was a 21-minute outage due to a continuing -- but now solved, we think -- problem with exporting content from our Oracle database server.

Here are the overall numbers for ICPSR's 2012-2013 fiscal year:


click to enlarge

We replaced our aging Oracle database server with a new machine which has twice the memory, twice the computing power, and perhaps most impressively, has 300 times the disk I/O speed(!).  The new machine has an array of solid-state drives (SSDs), and we use this for all of our database storage.  (The operating system resides on conventional disk drive technology.)