About a month ago we had a small problem with one of our NS 120 Celerra NAS units. (It may have been soft errors on one of its disk drives.) The Celerra detected the problem, and went to do its usual thing: collect logs and other analytics, and then copy them to EMC's anonymous ftp site. Our Celerra uses a utility under /nas/tools called /nas/tools/transfer_support_materials to do this. We noticed that when the Celerra tried to transfer the support materials that too failed. And this generated an additional series of critical errors.
We logged into the Celerra's control station and ran transfer_support_materials by hand. And we saw a message like this:
[nasadmin@controller tools]$ /nas/tools/transfer_support_materials -uploadlog
transfer_support_materials[12057]: The transfer script has started.
PING ftp.emc.com (168.159.219.138) 56(84) bytes of data.
From 12.249.233.6 icmp_seq=0 Packet filtered
--- ftp.emc.com ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
, pipe 2
cd: Access failed: 550 Requested action not taken. File unavailable. (/incoming/APM00000000000)
`/nas/var/emcsupport/support_materials_APM00000000000.130407_1351.zip' at 65536 (0%) 49.1K/s eta:5m [Connection idle]
I've replaced our Celerra's serial number with the string "00000000000".
We then ran ftp by hand to see if we could replicate the error:
Connected to ftp.emc.com.
220-Proceeding further constitutes acknowledgement
to EMC Acceptable Use and Customer Security policies.
Anonymous uploads are immediately moved to a secure server accessible only
within EMC networks.
File downloads from ftp.emc.com are restricted to selected /pub directories, via
temporary secure accounts or via specific permanent secure accounts only.
Anonymous users please login with anonymous and email address as your password
See Powerlink emc278739 for upload instructions.
EMC staff: please refer to current services, FAQ and Best Practices documents at
http://one.emc.com/clearspace/community/active/css/projects/ftp-service
Please email all questions and concerns to ftpquestions@emc.com
220 Please reference the FTP Acceptable Use policy: http://itcentral.corp.emc.com/Policies/AcceptableUse.pdf
534 Command denied.
534 Command denied.
KERBEROS_V4 rejected as an authentication type
Name (ftp.emc.com:nasadmin): anonymous
331 User name okay, need password.
Password:
230 User logged in, proceed.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd /incoming/APM00000000000
550 Requested action not taken. File unavailable.
ftp>
So, the problem was that the directory that holds our support materials (/incoming/APM<serialnum>) was missing or had its mode set to something that disallowed access.
We contacted EMC, and some days later they confirmed that the problem was indeed that the directory was missing, and that they had recreated it. We then ran ftp by hand to confirm that everything was working again, and it was. That was good news, but when we tried the same thing on our second NS 120 Celerra, we discovered that it too was missing its "support directory" on the ftp server. So we added that trouble report to our service request, and some days later, EMC confirmed that too had been missing, and then again recreated it. In speaking with EMC it is a bit unclear if this problem is particular to us or more broad.
The upshot of the story is that if you too run a Celerra or other product that sends support materials to EMC via anonymous ftp, this might be a good day to test out transfer_support_materials to make sure that your "support directory" is intact. If so, that's great, but if it is missing, you may want to open a service request with EMC soon so that they can recreate the directory for you. Better to have it in place before your system needs to send support materials, but is not able to do so.
I should note that we're still happy overall with EMC; in fact, we've just purchased the first three nodes of a new Isilon storage system from them. So the intent here isn't to excoriate them over the missing ftp directory; it was easy to reproduce the problem and to correct it. But we did wish that we had been able to learn about the problem prior to the disk failure so that it could have been corrected earlier, not when the Celerra was trying to report a disk failure.