Wednesday, May 1, 2013

EMC anonymous ftp service and transfer_support_materials

I have not seen notes about this in forums and boards, and so thought I would pass this along to others who may be using EMC gear.

About a month ago we had a small problem with one of our NS 120 Celerra NAS units.  (It may have been soft errors on one of its disk drives.)  The Celerra detected the problem, and went to do its usual thing:  collect logs and other analytics, and then copy them to EMC's anonymous ftp site.  Our Celerra uses a utility under /nas/tools called /nas/tools/transfer_support_materials to do this. We noticed that when the Celerra tried to transfer the support materials that too failed.  And this generated an additional series of critical errors.

We logged into the Celerra's control station and ran transfer_support_materials by hand.  And we saw a message like this:

[nasadmin@controller tools]$ /nas/tools/transfer_support_materials -uploadlog
transfer_support_materials[12057]: The transfer script has started.
PING ftp.emc.com (168.159.219.138) 56(84) bytes of data.
From 12.249.233.6 icmp_seq=0 Packet filtered

--- ftp.emc.com ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
, pipe 2
cd: Access failed: 550 Requested action not taken. File unavailable. (/incoming/APM00000000000)
`/nas/var/emcsupport/support_materials_APM00000000000.130407_1351.zip' at 65536 (0%) 49.1K/s eta:5m [Connection idle]

I've replaced our Celerra's serial number with the string "00000000000".

We then ran ftp by hand to see if we could replicate the error:

nasadmin@controller tools]$ ftp ftp.emc.com
Connected to ftp.emc.com.
220-Proceeding further constitutes acknowledgement
to EMC Acceptable Use and Customer Security policies.
Anonymous uploads are immediately moved to a secure server accessible only
within EMC networks.
File downloads from ftp.emc.com are restricted to selected /pub directories, via
temporary secure accounts or via specific permanent secure accounts only.
Anonymous users please login with anonymous and email address as your password
See Powerlink emc278739 for upload instructions.
EMC staff: please refer to current services, FAQ and Best Practices documents at
http://one.emc.com/clearspace/community/active/css/projects/ftp-service
Please email all questions and concerns to ftpquestions@emc.com
220 Please reference the FTP Acceptable Use policy: http://itcentral.corp.emc.com/Policies/AcceptableUse.pdf
534 Command denied.
534 Command denied.
KERBEROS_V4 rejected as an authentication type
Name (ftp.emc.com:nasadmin): anonymous
331 User name okay, need password.
Password:
230 User logged in, proceed.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd /incoming/APM00000000000
550 Requested action not taken. File unavailable.
ftp>

So, the problem was that the directory that holds our support materials (/incoming/APM<serialnum>) was missing or had its mode set to something that disallowed access.

We contacted EMC, and some days later they confirmed that the problem was indeed that the directory was missing, and that they had recreated it.  We then ran ftp by hand to confirm that everything was working again, and it was. That was good news, but when we tried the same thing on our second NS 120 Celerra, we discovered that it too was missing its "support directory" on the ftp server.  So we added that trouble report to our service request, and some days later, EMC confirmed that too had been missing, and then again recreated it.  In speaking with EMC it is a bit unclear if this problem is particular to us or more broad.

The upshot of the story is that if you too run a Celerra or other product that sends support materials to EMC via anonymous ftp, this might be a good day to test out transfer_support_materials to make sure that your "support directory" is intact.  If so, that's great, but if it is missing, you may want to open a service request with EMC soon so that they can recreate the directory for you.  Better to have it in place before your system needs to send support materials, but is not able to do so.

I should note that we're still happy overall with EMC; in fact, we've just purchased the first three nodes of a new Isilon storage system from them.  So the intent here isn't to excoriate them over the missing ftp directory; it was easy to reproduce the problem and to correct it.  But we did wish that we had been able to learn about the problem prior to the disk failure so that it could have been corrected earlier, not when the Celerra was trying to report a disk failure.
 


Monday, April 29, 2013

ICPSR launches Measures of Effective Teaching web site

Some of my colleagues, including ICPSR Director George Alter, gave a demo of one of our newest Web sites and collections at the American Educational Research Association 2013 annual meeting on Sunday.
MET LDB web site
Click the image to navigate to the live site
My team has built the video portal portion of the system.  The portal enables a researcher to play a list of videos that s/he has selected to view based on an analysis of the associated quantitative data and tagging data.  Access to the video and datasets is restricted and requires one to complete a data use agreement via ICPSR's web-based request system.

We're grateful for the support we've received from the Bill and Melinda Gates Foundation to make all of this possible.

Monday, April 22, 2013

Qualys browser checker

Ever since the recent craziness with vulnerabilities in Java plugins, I've making a concious effort to use Qualys's browser checker - https://browsercheck.qualys.com/ - on a routine basis both at home and at the office.

Installing the tool in your browser is very easy, and the service is free and painless to use.  I have been using it to both to determine if my current browser and plugins are up to date, and also to identify plugins that are installed and enabled, but which I don't really need or use (e.g., Silverlight which I often disable for long stretches at a time).

Qualys generates a nice report



like the one above to let you know if everything is up-to-date.

Thursday, April 18, 2013

Web availability at ICPSR - March 2013

ICPSR's content delivery system showed very high availability in March 2013:  a bit over 99.95% uptime.  We had only two problems in March.  One was a power outage that affected our headquarters on the University of Michigan campus, and we experienced a small amount of downtime as we moved service to our replica in Amazon's cloud.  The second was a 21-minute outage due to a continuing -- but now solved, we think -- problem with exporting content from our Oracle database server.

Here are the overall numbers for ICPSR's 2012-2013 fiscal year:


click to enlarge

We replaced our aging Oracle database server with a new machine which has twice the memory, twice the computing power, and perhaps most impressively, has 300 times the disk I/O speed(!).  The new machine has an array of solid-state drives (SSDs), and we use this for all of our database storage.  (The operating system resides on conventional disk drive technology.)

Friday, November 16, 2012

Web availability at ICPSR - October 2012

October was a very good month for system uptime - over 99.9% availability:

Click chart to enlarge
That's good news after a much rougher September.  So far things look good this month, although a number of very short-lived outages have already pushed us below 99.9% for the month.

Wednesday, November 14, 2012

A commentary on MOOCs from Clay Shirky

Some of my colleagues - past and present - are attending classes in Massive Open Online Courses (MOOCs).  I've been following their stories and also columnists who have been talking about MOOCs and education.  It is a very interesting time.

Clay Shirky has a long post (Napster, Udacity, and the Academy) about MOOCs that is well worth reading.  Some highlights:
The recording industry concluded this new audio format would be no threat, because quality mattered most. Who would listen to an MP3 when they could buy a better-sounding CD at the record store? Then Napster launched, and quickly became the fastest-growing piece of software in history. The industry sued Napster and won, and it collapsed even more suddenly than it had arisen.

If Napster had only been about free access, control of legal distribution of music would then have returned the record labels. That’s not what happened. Instead, Pandora happened. Last.fm happened. Spotify happened. ITunes happened. Amazon began selling songs in the hated MP3 format.
and
It’s been interesting watching this unfold in music, books, newspapers, TV, but nothing has ever been as interesting to me as watching it happen in my own backyard. Higher education is now being disrupted; our MP3 is the massive open online course (or MOOC), and our Napster is Udacity, the education startup.

We have several advantages over the recording industry, of course. We are decentralized and mostly non-profit. We employ lots of smart people. We have previous examples to learn from, and our core competence is learning from the past. And armed with these advantages, we’re probably going to screw this up as badly as the music people did.

Monday, November 12, 2012

Nick Carr, MOOCs, and ethics

In his post The ethics of MOOC research, Nick Carr describes a note he received from a colleague in academia who comments on the research agenda of Massive Open Online Courses:
The MOOCs’ research agenda seems entirely wholesome. But it does raise some tricky ethical issues, as a correspondent from academia pointed out to me after my article appeared. “At most institutions,” he wrote, the kind of behavioral research the MOOCs are doing “would qualify as research on human subjects, and it would have to be approved and monitored by an institutional review board, yet I have heard nothing about that being the case with this new adventure in technology.” Universities are, for good reason, very careful about regulating, approving, and monitoring biological and behavioral research involving human subjects. In addition to the general ethical issues raised by such studies, there are strict federal regulations governing them. I am no expert on this subject, but my quick reading of some of the federal regulations suggests that certain kinds of purely pedagogical research are exempt from the government rules, and it may well be that the bulk of the MOOC research falls into that category.
Given the intense energy ICPSR has been putting its systems for protecting confidential research data and facilitating requests for using such data, I found this very interesting.

I see parallels here with collecting and using personal information.  If one conducts a survey and asks personal questions to well-consented adults, the results might one day become an interesting, restricted-use dataset.  But if the same information is harvested from freely and openly blogs, tweets, and wall posts, would it also become restricted-use data?