Monday, October 15, 2012

Artificial Intelligence as defined by Nick Carr

Nick Carr has a short post here marking the occasion of Facebook's one billionth member.  He goes on to talk a bit about some work at Google on neural nets, but then includes this gem on artificial intelligence:
Forget the Turing Test. We’ll know that computers are really smart when computers start getting bored. If you assign a computer a profoundly tedious task like spotting potential house numbers in video images, and then you come back a couple of hours later and find that the computer is checking its Facebook feed or surfing porn, then you’ll know that artificial intelligence has truly arrived.
It's a short post and a good read.

Wednesday, October 10, 2012

September 2012 deposits at ICPSR

The numbers from September are in:


# of files# of depositsFile format
291F 0x07 video/h264
14517application/msword
51application/octet-stream
29211application/pdf
145application/vnd.ms-excel
11application/vnd.ms-powerpoint
1201application/x-arcview
311application/x-dbase
11application/x-rar
246application/x-sas
146application/x-spss
135application/x-stata
55application/x-zip
11image/jpeg
321image/x-3ds
702multipart/appledouble
103text/plain; charset=unknown
6218text/plain; charset=us-ascii
21text/rtf
292text/xml

Interesting month in that we have the usual stuff in the usual quantities, but we also have a large number of unusual formats hitting the doorstep, such as ArcView and Apple Double.  And we also have a usual format in an unusually high quantity (MS Word).

Friday, October 5, 2012

September 2012 web availability

September was an OK, but not great month for web availability:


Click to enlarge
We eliminated one frequent, but short-lived source of downtime when we stopped exporting the content of our Oracle database nightly.  We are now doing it only on the weekend, and while that adds some risk, we're gaining significant uptime.  (For some reason that we do not understand, our Oracle instance stops answering queries for 15-20 minutes about ten minutes AFTER the export completes.)  We have a new server racked and ready to install, and we're hoping that a fast new machine with solid-state drives will solve the problem for us.

We did run into some trouble mid-month when some routine maintenance went awry, and we had to fail over to our replica over the weekend of September 15 and 16.  The total amount of downtime was about 90 minutes total over the course of the weekend, but the replica kept the problem from clobbering our service completely.

After that we had pretty smooth sailing for the rest of the month.  Just 16 minutes of downtime for the rest of the month.

Wednesday, October 3, 2012

Setting up Kaltura - part VI

We'll focus on the Kaltura Drop Folder feature today.  The Drop Folder offers a mechanism whereby an enterprise can bulk upload content without human intervention.  In principle this is an excellent way for a library or archive to ingest many objects into Kaltura without some poor archivist performing individual (or group) uploads via a web GUI.  In practice the mechanism works smoothly when things are going well, but it can be a little difficult to diagnose problems when things go awry.

For example, here's a sample display from the Drop Folders panel from our Kaltura Management Console (KMC), which serves as an all-in-one dashboard for managing content:

Click to see a full-size image

According to this display we have just ingested three items:  an XML file and two video files.  In this particular case the XML file contained all of the metadata for the two video files, and contained instructions that told Kaltura that these were new items to ingest.  The Status field shows a value of Done, and the Error Description field is empty.  This seems good.

We can also see status information if we navigate to the Upload Control panel and select the Bulk Upload view.  Here we see similar info:

Click to see a full-size image
Again, this seems like good news.  The Notification column shows a value of Finished successfully.  Hooray!

But not so fast, my friend....

If we examine one of the video files under the Content panel (Entries tab), we see that none of the extended metadata is present.  We can see the Custom Data fields, but they are all empty.  Hmm, what happened?

If we navigate back to the Upload Control display, the last column offers some possible help:


There is an Action available to download a log file.  That sounds promising.  Let's do that.

The log file is in XML format, and if we open it up in a good browser or text editor or XML editor, we find XML that looks very much like the ingest XML we used in the Drop Folder.  And if we scroll all the way down to the bottom, we find this snippet:

<item><result><errorDescription>customDataItems failed: invalid metadata data: Element 'METXVideoSubmissionElectronicBoardUsed': [facet 'enumeration'] The value ' ' is not an element of the set {'Y', 'N'}. at line 87 Element 'METXVideoSubmissionElectronicBoardUsed': ' ' is not a valid value of the local atomic type. at line 87 </errorDescription></result>
This is telling us that we messed up one of the metadata fields.  If we look at the original ingest XML and find the statement that is supposed to be setting METXVideoSubmissionElectronicBoardUsed, sure enough, there is no value.  (The error occurs on line 296, not 87, which is a bit confusing.)

So the good news is that if we notice the error, we can find a log that will point us at the error.  But detecting the error is a little tricky, and it is easy to see how this would be difficult if we were ingesting, say, 100 items at a time.  So this is not awful, but is also not quite as nice as we might like.

Suggestions:
  1. If the XML contains Custom Data, and the Custom Data has errors, but the video still ingests, perhaps a Status of something like "Done with errors" (in the Drop Folders display) or "Finished with Custom Data errors" (in the Bulk Upload Log display).
  2. Make the diagnostic message (errorDescription) available without needing to download a file.  This could appear in a new column, or perhaps in a text pop-up.
  3. If N - 1 elements of the Custom Data are good, but one is bad, it would be nice if the other N - 1 Custom Data fields are set.  That would make it possible to correct the error manually in the KMC rather than copying fresh XML into the Drop Folder.
  4. Suppress the line numbers since they are relative to the log file XML, not the original XML.
Again, overall the Drop Folder feature is very nice, and we will indeed use it to ingest the 20-30,000 video files in our collection.  But since it is likely that we will sometimes make a mistake within the XML (say, forgetting to escape a certain character), it would be great if the KMC would make it hard to detect and diagnose mistakes.

Monday, October 1, 2012

ICPSR Director of Curation Services

We have interviewed all of the candidates.
See http://dilbert.com/strips/comic/2011-10-30 for the full cartoon
I had the opportunity to meet with several of the candidates, and we have several excellent ones.  With a little bit of luck we should be able to announce who will be filling the position sometime soon.

And then s/he can explain exactly what Curation Services are. :-)