In this post we will look at the XML we use to ingest content into Kaltura through its Drop Folder mechanism. To re-cap an
earlier post about the Drop Folder, ours is a subdirectory under the home directory of the 'kaltura' user. We provisioned this account on a special-purpose machine that Kaltura accesses via
sftp to fetch content without human intervention.
Kaltura has a
pretty nice guide to building the XML, which looks an awful lot like Media RSS. And they also make some short examples available, but we always find it useful to have a real-world example. Here's ours.
Preface: Our stuff is a little unusual. That is:
- We always have pairs of videos, one classroom and one blackboard
- We have lots of metadata and it applies to both videos, and so the metadata gets repeated in the XML. I have excised much of the metadata in our XML for this post
- All of the metadata is fake; it is not real metadata about an actual classroom video
- You can find a copy of the XML that we diagram below at this URL http://goo.gl/OxbyU
- Our use of Kaltura is in support of the Measures of Effective Teaching (Extension) project, and so there are many references to 'metext' in the metadata
- We will be generating the XML for ingest programatically
Here goes.
<?xml version="1.0"?>
<mrss xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="ingestion.xsd">
<channel>
<item>
<!-- These do not change -->
<action>add</action>
<type>1</type>
<!-- This changes for each video -->
<referenceId>12345-Board-video-rendition-MP4-H264v-Standard.mp4</referenceId>
<!-- This does not change -->
<userId>metext</userId>
<!-- This changes for each video -->
<name>RCN 12345 board video</name>
<!-- This changes for each video -->
<description>RCN 12345 board video</description>
We have the usual XML stuff at the beginning, and then the start of the Media RSS.
Action = add for new content.
Type = 1 for video content.
ReferenceID is the name of the original file.
UserID is the pseudo-user who will be linked to the content in Kaltura.
Name and Description are exposed as base metadata in Kaltura.
<!-- Always assign two tags, one called metext and the other board or classroom -->
<tags>
<tag>metext</tag>
<tag>board</tag>
</tags>
<!-- This does not change -->
<categories>
<category>metext</category>
</categories>
<!-- This does not change -->
<media>
<mediaType>1</mediaType>
</media>
We tag everything with the project name and indicate whether it is a video of the blackboard or the classroom.
Kaltura uses Categories as a main way to browse and find content. We treat this as if it were a type of "is in collection" sort of attribute.
MediaType = 1 for video.
<!-- This changes for each video -->
<contentAssets>
<content>
<dropFolderFileContentResource filePath="12345-Board-video-rendition-MP4-H264v-Standard.mov"/>
</content>
</contentAssets>
This tells Kaltura that the video file is in the Drop Folder along with the XML.
Now for our project-specific metadata, which fits into a Kaltura structure called Custom Data:
<!-- This changes for each video -->
<customDataItems>
<customData metadataProfileId="22971">
<xmlData>
<metadata>
<METXDistrictDistrictName>Ann Arbor</METXDistrictDistrictName>
<METXDistrictDistrictNum>20</METXDistrictDistrictNum>
<METXSchoolSchoolName>Huron High School</METXSchoolSchoolName>
<METXSchoolSchoolMETXID>33</METXSchoolSchoolMETXID>
<!-- Kaltura wants the date to be in xs:long format, that is, seconds from the epoch -->
<METXVideoSubmissionCaptureDate>1344440267</METXVideoSubmissionCaptureDate>
</metadata>
</xmlData>
</customData>
</customDataItems>
The ID attribute is from our Kaltura KMC. I had to create the Custom Data schema first, and then reference it in the ingest XML here.
Most of the metadata fields are simple strings or strings from a controlled vocabulary. We do have one date item, and sadly Kaltura expects it to be in a difficult-to-use format, seconds since the epoch.
After this section of the XML is a closing tag for item, and then the whole thing repeats with only minor variation for the classroom video. I'll include it below for completeness.
</item>
<item>
<!-- These do not change -->
<action>add</action>
<type>1</type>
<!-- This changes for each video -->
<referenceId>12345-Classroom-video-rendition-MP4-H264v-Standard.mp4</referenceId>
<!-- This does not change -->
<userId>metext</userId>
<!-- This changes for each video -->
<name>RCN 12345 classroom video</name>
<!-- This changes for each video -->
<description>RCN 12345 classroom video</description>
<!-- This changes for each video -->
<tags>
<tag>metext</tag>
<tag>classroom</tag>
</tags>
<!-- This does not change -->
<categories>
<category>metext</category>
</categories>
<!-- This does not change -->
<media>
<mediaType>1</mediaType>
</media>
<!-- This changes for each video -->
<contentAssets>
<content>
<dropFolderFileContentResource filePath="12345-Classroom-video-rendition-MP4-H264v-Standard.mov"/>
</content>
</contentAssets>
<!-- This changes for each video -->
<customDataItems>
<customData metadataProfileId="22971">
<xmlData>
<metadata>
<METXDistrictDistrictName>Ann Arbor</METXDistrictDistrictName>
<METXDistrictDistrictNum>20</METXDistrictDistrictNum>
<METXSchoolSchoolName>Huron High School</METXSchoolSchoolName>
<METXSchoolSchoolMETXID>33</METXSchoolSchoolMETXID>
<!-- Kaltura wants the date to be in xs:long format, that is, seconds from the epoch -->
<METXVideoSubmissionCaptureDate>1344440267</METXVideoSubmissionCaptureDate>
</metadata>
</xmlData>
</customData>
</customDataItems>
</item>
</channel>
</mrss>
And that's it.