Resource discovery

From Bjoern Hassler

Revision as of 12:35, 14 September 2009 by Bjoern (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
syndication  |  resource discovery  |  formats  |  Syndication and metadata  |  Using media rss to syndicate and share media  |  Media rss media group element  |  Overview of images in rss  |  Category:Syndication

How do you let people know what stuff you have and where your stuff is?

Most people sensibly share information about their podcasts through rss or atom feeds. There's some discussion on this page "Using media rss to syndicate and share media" as to what a relatively minimal, yet complete, (media)rss feed should look like, but this equally well applies to atom.

Here, we ask the question how you find the feed in the first place.

Contents

[edit] 1 rss auto-discovery

Discovering individual feeds is possible via html. For instance, look at this page. It has a human-readable and clickable subscription (for itunes, miro, basic rss), but there's also a statement in the <head> of the document

 <link href="http://podcast.open.ac.uk/feeds/k216-applied-social-practice/rss2.xml" 
       rel="alternate" type="application/rss+xml" title="Applied social work practice" />

This is rss auto-discovery, which "is a technique that makes it possible for browsers and other software to automatically find a site's RSS feed."

To build the Steeple podcast portal, we can thus scrape the OU site, pulling out all rss feeds, and then ingesting those feeds.

[edit] 2 A feed o' feeds

The other way to do this is to create a sort of index feed (a "feed of feeds"). We look at two exampes.

[edit] 2.1 Oxford podcast opml

For instance, to pull the Oxford items into the Steeple podcast portal, we use an opml feed made available by Oxford. The feed looks like this:

<?xml version="1.0" encoding="utf-8"?>
<opml version="2.0">
 <head>
   <title>Podcasts from the University of Oxford</title>
   <ownerName>OXITEMS, University of Oxford</ownerName>
   <ownerEmail>oxitems@oucs.ox.ac.uk</ownerEmail>
 </head>
 <body>
   <outline 
     text="Foundation for Law, Justice and Society" 
     description="Podcasts from the Foundation for Law, Justice and Society, an independent institution ..." 
     htmlUrl="http://www.fljs.org/" 
     title="Foundation for Law, Justice and Society" 
     type="rss" 
     version="RSS2" 
     xmlUrl="http://rss.oucs.ox.ac.uk/socleg/fljs-audio/rss20.xml" 
   />
   <outline 
     text="The Credit Crunch and Global Recession" 
     description="A podcast series about the credit crunch and global recession featuring Oxford academics. ..." 
     htmlUrl="http://www.econ.ox.ac.uk/" 
     title="The Credit Crunch and Global Recession" 
     type="rss" 
     version="RSS2" 
     xmlUrl="http://rss.oucs.ox.ac.uk/econ/credit-crunch-audio/rss20.xml" 
   />
   ...
 </body>
</opml>

This allows us to discover the Oxford feeds systematically. However, the opml feed isn't publicly linked, and there isn't cross-institutional agreement where the feed would be. This is discussed further below.

[edit] 2.2 OCW opml

2nd example, this time not video, but OCW.

<opml version="1.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
       <head>
               <title>OpenCourseWare Consortium Member Course Feeds</title>
                <dateCreated>Tue, 03 Feb 2009 13:51:00 +0100</dateCreated>
       </head>
       <body>
       <outline 
         type="rss"  
         title="Weber State University" text="Weber State University" 
         xmlUrl="http://ocw.weber.edu/rss_all"/>
       <outline 
         type="rss"  
         title="University of California, Irvine" 
         text="University of California, Irvine" 
         xmlUrl="http://ocw.uci.edu/courses/rss.xml"/>
...
      <outline 
         type="rss"  
         title="The Open University" 
         text="The Open University" 
         xmlUrl="http://openlearn.open.ac.uk/file.php/1/learningspace.xml"/>
...

For reference, see these links

[edit] 3 Looking at the feeds themselves

Both of those "feeds o' feeds" point at further rss feeds. What do these feeds look like?

[edit] 3.1 Oxford podcast, Open University podcast

We won't go into this here, as the podcast format is discussed here Using media rss to syndicate and share media.

[edit] 3.2 2nd example UC Irvine OCW feed

Let us look at the OCW case:

<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
<channel>
<title>UC Irvine, OpenCourseWare</title>
<link>http://ocw.uci.edu/courses/</link>
<description>Open Courseware from the University of California, Irvine</description>
<language>en-us</language>
<copyright>Copyright 2008, UC Irvine Extension</copyright>
<webMaster>cwcurtis@uci.edu</webMaster>
<pubDate>Tue, 29 Aug 2007 14:53:11 PDT</pubDate>
<lastBuildDate>Fri, 5 Dec 2008 14:31:19 PDT</lastBuildDate>
<category>UC Irvine Extension</category>
<generator>In house</generator> 
       <item>
               <title>Physics 21: Science from Superheroes to Global Warming</title>
               <link>http://ocw.uci.edu/courses/physics_21/</link>
               <description>Have you ever wondered if Superman could really fly? What was 
                            Spiderman's spidey sense? How did Wonder ... </description>
               <pubDate>Fri, 5 Dec 2008 14:31:19 PDT</pubDate>
       </item>
...

[edit] 3.3 3rd example MIT OCW feed

Full example (with dc elements) here: MIT OCW feed, source http://ocw.mit.edu/OcwWeb/rss/all/mit-allcourses.xml

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet title="XSL_formatting" type="text/xsl" href="../../style/rss10.xsl"?>
<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns="http://purl.org/rss/1.0/" 
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel rdf:about="http://ocw.mit.edu/OcwWeb/web/courses/courses/index.htm">
    <title>MIT OpenCourseWare: All Courses</title>
    <description>All courses in all departments from MIT OpenCourseWare, provider of free and open MIT course materials.</description> 
    <link>http://ocw.mit.edu/OcwWeb/web/courses/courses/index.htm</link>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://ocw.mit.edu/OcwWeb/Chemistry/5-60Spring-2008/CourseHome/index.htm" />
    ...
</items>
</channel>
<item rdf:about="http://ocw.mit.edu/OcwWeb/Chemistry/5-60Spring-2008/CourseHome/index.htm">
<title>5.60 Thermodynamics & Kinetics, Spring 2008 (MIT)</title>
<description>This subject deals primarily with equilibrium properties of macroscopic systems, ... .</description>
<link>http://ocw.mit.edu/OcwWeb/Chemistry/5-60Spring-2008/CourseHome/index.htm</link>
</item>
...
</rdf:RDF>


[edit] 4 Discussion

[edit] 4.1 Standardisation of the "feed o' feeds"

  • Opml is widely used by rss readers for import/export of feed collections.
  • Opml is very loosely typed - we'd need a dtd for our specific case.
  • Wouldn't atom be better than opml?

[edit] 4.2 Standardisation of resource feeds

OCWC has this http://www.ocwconsortium.org/share/best-practices-rss.html but it's lacking for the video world, see Using media rss to syndicate and share media.

[edit] 4.3 Retrieval of metadata vs. retrieval of metadata+resource: Podcast feeds vs OCW rss feed

A important difference is that:

  • In the podcast case (Oxford, OU), the rss links directly to assets (e.g. mp3, mp4, pdf files). Hence the podcast feed does not just deliver the metadata, but can also deliver the learning resources.
  • In the OCW case, the rss feed points at web pages, and (because there is no way of downloading the web page with all resources systematically with that information alone), it only communicates the metadata.

So, what missing in the OCW case is a mechanism of then getting the resource. This could be done in a variety of ways:

  • There could be a further rss/atom feed, that has all the various resources linked from it (see next section for example)
  • There could be a link to a html5 manifest, allowing the systematic offlining of html resources
  • There could be a refernece to an rsync server, that allows materials to be retrieved via rsync
  • There might not be a further link, but it is ensured that a "wget -pk" retrieves all relevant information in usuable form.

It's important that there is a way of determining which resources need to be updated. Further thoughts about this on OER_4_Low_Bandwidth, which looks at delivery of "resource alternatives" in the OER content, and which in a way makes an analogy between the <media:group> element in media rss (see Using media rss to syndicate and share media), and delivering non-video resources with alternatives.

[edit] 4.4 A 4th example: OU OpenLearn

Open University OpenLearn feed structure.

Going back to our earlier discussion of the OCWC opml feed, we give a 4th example, and look at the OpenLearn feeds. The full discussion for OU feeds is here: /OpenLearn, and gives snippets of the feeds.

In terms of working with the OCWC opml: The OCWC opml links to the learningspace.xml feed, which points back at html pages (just like all the other institutions "master rss" feeds).

The open learn case is interesting, because there is an opml file, that links to rss feeds that have content in them. So if you have this opml file, you could download the OU content this way. If the OCWC opml linked to the OpenLearn opml, then it would be possilbe to download the whole of OpenLearn without going via a web page.

[edit] 5 Suggestion

Metafeed formats