Multi-lingual Literals in RDF

As a non-native english speaker, it’s good to see that both XML and RDF support language “tagging” of literals, to avoid the blind assumption that everything will be in English. Apparantly the concept doesn’t get much use though, I have yet to see any tools that support multiple languages at the application level (with the possible exception of foaf-a-matic, which I translated into Danish, but it doesn’t do it at the vocabulary level).

Since I sometimes do development in Danish, but also want to integrate with the rest of the world, I have begun creating vocabularies with labels in at least Danish and English. I have also set up partial HTML presentation in multiple languages – the syntax parts are completed, and the content negotiation setup should be working: When dereferencing e.g. the namespace URI for the label vocabulary, http://purl.org/net/vocab/2004/03/label, you should get the English HTML version, unless you have your browser set up to accept Danish [da] as I have, in which case you should get the Danish version. If your user agent sends an Accept: header containing application/rdf+xml, you should get the RDF/XML version – even if this method of operation isn’t completely defined, the W3C TAG is working hard on the issue, httpRange-14.

Now, how to decide which literal to use when multiple are present?

For my vocabulary transformation, I’m currently going with the following algorithm, but I’m looking for input on improving it:

  1. Pick the one for the preferred language, if indicated.
  2. Pick the one in English (or without a language) if no language is preferred.
  3. Otherwise show them all, with indication of actual language for each.

Or, expressed as an XSLT template, which gets invoked for all occurrences:

<xsl:template mode="literal-value" match="*"> 
  <xsl:variable name="this" select="concat(namespace-uri(),local-name())"/> 
  <xsl:choose> 
    <!-- Just one value or value in the preferred language --> 
    <xsl:when test="count(../*[concat(namespace-uri(),local-name())=$this])=1 
        or starts-with(@xml:lang,$lang)"> 
      <xsl:value-of select="."/> 
    </xsl:when> 
    <!-- A value in the preferred language is present, but it's not this one --> 
    <xsl:when test="../*[concat(namespace-uri(),local-name())=$this and starts-with(@xml:lang,$lang)]"> 
    </xsl:when> 
    <!-- Fall back to English if present --> 
    <xsl:when test="starts-with(@xml:lang,'en') or not(@xml:lang)"> 
      <xsl:value-of select="."/> 
    </xsl:when> 
    <xsl:when test="../*[concat(namespace-uri(),local-name())=$this and (starts-with(@xml:lang,'en') or not(@xml:lang))]"> 
    </xsl:when> 
    <!-- Multiple values present, but not one that is in the preferred language or English --> 
    <xsl:otherwise> 
      <span lang="{@xml:lang}"> 
        <xsl:value-of select="."/> 
      </span> 
      <xsl:text> [</xsl:text> 
      <xsl:value-of select="@xml:lang"/> 
      <xsl:text>]</xsl:text> 
      <xsl:if test="following-sibling::*[concat(namespace-uri(),local-name())=$this]"> 
        <xsl:text>, </xsl:text> 
      </xsl:if> 
    </xsl:otherwise> 
  </xsl:choose>
</xsl:template> 

It’s a little hairy, and not very efficient – the choice of which value to output would likely be better off in the select attribute of the invoking template.

2 thoughts on “Multi-lingual Literals in RDF

Comments are closed.