Phil Dawes' Stuff - Latest Comments in URIs make metadata complicated

Re: URIs make metadata complicated

Ed Davies — Thu, 22 Sep 2005 07:51:09 -0000

Do URIs actually require more upfront thought than other schemes, though?
More than localized, context bound schemes - yes.

But the really cute thing about URIs are that they form a sort of federation of separate localized context bound schemes. Each URIRef carries around inside itself both the global name of the local scheme and the name within that scheme - all in a reasonably familiar, readable and compact sequence of characters. So no, I don't think URIRefs can require more upfront thought than localized schemes other than the trivial issue of deciding on the first part of the URI used to prefix names in the scheme - to make them globally unique.

Re: URIs make metadata complicated

PhilDawes — Thu, 22 Sep 2005 02:56:58 -0000

Joshua Tauberer writes:

> using URIs collaboratively and successfully requires a non-trivial amount of upfront thought, documentation and proactive consensus building.

Every naming scheme is going to be like that, to some degree. Do URIs actually require more upfront thought than other schemes, though?

More than localized, context bound schemes - yes. E.g.

PhilDawes name "Phil Dawes"
PhilDawes email pdawes@users.sf.net

didn't require much thought, because it is bound to the scope of this blog comment. It's a bit throwaway, but you still understand what I mean to some degree because you understand something of the context under which I wrote it.

Re: URIs make metadata complicated

PhilDawes — Thu, 22 Sep 2005 02:48:02 -0000

Joshua Tauberer writes:

> (1) URIs don’t allow you to use existing identity schemes.

Exactly because to do so would be ambiguous. How do you know what identity scheme is being used?

Context tells you this.

> (4) URIs require a level of precision in ‘meaning’ that is hard to attain. URIs are globally scoped, which means they need to mean the same thing in any context.

If this weren’t the case, no two RDF documents could ever be merged because you would never know if the authors intended their nodes to denote the same thing. But, like it was pointed out, it’s not necessarily a problem if this doesn’t occur in practice.

I think when it doesnt happen in practice it's because the people doing the merging know something of the context under which the document is written. You need this anyway - otherwise how do you know that the author of the RDF graph is a reliable source, or even competent in RDF?

Besides - I think this problem does happen in practice.

Re: URIs make metadata complicated

Joshua Tauberer — Wed, 21 Sep 2005 06:48:28 -0000

Each of the problems you point out are really by design:

> (1) URIs don’t allow you to use existing identity schemes.
Exactly because to do so would be ambiguous. How do you know what identity scheme is being used? You could, say, prefix it with the name of the scheme (i.e. myscheme:12345) -- but then you have to unambiguously identity the scheme name. If the scheme name is unambiguous, then you have a URI anyway.

> (2) HTTP URIs have a load of implicit baggage
It's not a requirement that people use HTTP URIs. I'd be all for throwing away these, but that doesn't mean throwing away the entire URI concept.

> (3) URIs are URLs
Aren't URLs URIs? Same as 2.

> (4) URIs require a level of precision in ‘meaning’ that is hard to attain. URIs are globally scoped, which means they need to mean the same thing in any context.
If this weren't the case, no two RDF documents could ever be merged because you would never know if the authors intended their nodes to denote the same thing. But, like it was pointed out, it's not necessarily a problem if this doesn't occur in practice.

> using URIs collaboratively and successfully requires a non-trivial amount of upfront thought, documentation and proactive consensus building.
Every naming scheme is going to be like that, to some degree. Do URIs actually require more upfront thought than other schemes, though?

Re: URIs make metadata complicated

Ed Davies — Wed, 21 Sep 2005 06:21:54 -0000

In theory, when you merge data, you determine that the same URI has different referents via logical inconsistencies; in practice you have domain experts and data modellers look analyse the data (just like you do with relational database integrations).

Surely, if two or more datasets use the same URI to denote different resources then at least one of them is simply wrong - it is not using the URI in the way that the URI's original minter intended. In practice, you need to have your domain experts fix up the data before the merge.

Re: URIs make metadata complicated

Ed Davies — Wed, 21 Sep 2005 06:11:49 -0000

(3) and partly (2): Don't point a gun at a person unless you mean to kill them. Don't point an HTTP URL at a resource unless you mean to retrieve it (or otherwise access it using the Hypertext Transport Protocol). For "real-world" things use tag: or similar URIs.

This will make the distinction clearer to people and will also avoid wasted network traffic when attempts are made to retrieve the resource.

I realise I'm in a minority with respect to this opinion on the use of HTTP URLs but I've yet to see a coherent argument against it.

Re: URIs make metadata complicated

PhilDawes — Tue, 20 Sep 2005 19:01:00 -0000

Bill de hOra writes:
Strictly speaking, (4):

“URIs are globally scoped, which means they need to mean the same thing in any context.”

isn’t true, for RDF. URIs don’t have meaning they have denotations; denotations are assigned (”distributed”) and that can be done in a local scope. In theory, when you merge data, you determine that the same URI has different referents via logical inconsistencies; in practice you have domain experts and data modellers look analyse the data (just like you do with relational database integrations).

Ok - that makes sense (although I haven't read that anywhere before - but then I'm starting to fall behind with the literature ;-) ).

Which means that there's probably a lot of scope for simplifying RDF - you can't throw a baby out with the bathwater if it wasn't in the bath to begin with.

Re: URIs make metadata complicated

PhilDawes — Tue, 20 Sep 2005 18:28:30 -0000

Hi Jimmy,

Jimmy Cerra writes:

Why can't you use blank nodes if you can't use URI References? Resources don't need to be named, and sometimes (like in a database-like environment) most resources will be unnammed.

If you are willing to step up to OWL, then with inverse-functional properties you can still identify things with a “public key” like structure. However, you can do that anyway with any practical RDF application too.

Actually I attempted to follow this approach at work for a while (ala foaf), and was indeed willing to step up to OWL - my veudas triplestore supported inverse-functional properties for this reason (via a forward-chaining reasoner e.g. see circa sep 2004 if you're interested!).
It did make things complicated though - IFP smushing was slow, and unless you're going to give people cookie-cutter examples then they really do need to understand IFPs.

e.g. people don't naturally write:

<project>
   <name>My Application</name>
   <maintainer>
        <foaf:Person>
             <foaf:mbox>foo@example.com</foaf:mbox>
        </foaf:Person>
   </maintainer>
<project>

Unfortunately cookie-cutter examples kind-of miss the point - you might as well be translating people's data into RDF for them. The real goal for me at work was that people could come up with their own data (from their own systems) that could be aggregated and merged usefully, otherwise it's not really worth the trouble.

...
Also, the only requirement is that URI References are semantically uniform across the graphs you use it in. Problems happen when you merge graphs that have different semantics with the same URI Reference, but sometimes the types of graphs merged are small and managable.

If you have to merge with large numbers of graphs or with the whole Semantic Web for all of eternity, then I can see where minging URI References is a problem. But that is a social problem with naming itself and not RDF.

I think it's a problem with globally scoped naming. - The RDF model doesn't allow for any skewing of meaning with context. You can't change society, and global adoption is one of the aims of the semantic web.

To be honest I think this sort-of illustrates a wider point - if you're just going to work on small manageable sets of data then why bother with complex URI and RDF machinery that inhibit adoption? - It strikes me as quite ironic that the very RDF machinery that was intended to facilitate this large-scale aggregation of data actually ends up inhibiting it.

Re: URIs make metadata complicated

Bill de hOra — Tue, 20 Sep 2005 17:58:36 -0000

Strictly speaking, (4):

"URIs are globally scoped, which means they need to mean the same thing in any context."

isn't true, for RDF. URIs don't have meaning they have denotations; denotations are assigned ("distributed") and that can be done in a local scope. In theory, when you merge data, you determine that the same URI has different referents via logical inconsistencies; in practice you have domain experts and data modellers look analyse the data (just like you do with relational database integrations).

For me, you left out an most important thing, which is lots of URIs in the same place are hard to read. QNames win the readability argument.

Re: URIs make metadata complicated

Jimmy Cerra — Tue, 20 Sep 2005 16:05:08 -0000

I've come to a few different conclusions regarding the above assertions:

(1) Why can't you use blank nodes if you can't use URI References? Resources don't need to be named, and sometimes (like in a database-like environment) most resources will be unnammed.

If you are willing to step up to OWL, then with inverse-functional properties you can still identify things with a "public key" like structure. However, you can do that anyway with any practical RDF application too.

Also, minting URI References are easy. Here's a URIRef: "data:Jimmy_Cerra". It is a little different from english, but we are working with computer languages not english. Would those people complain about writing their words in languages like Japanese; so are those people having reasonable expetations?

Also, the only requirement is that URI References are semantically uniform across the graphs you use it in. Problems happen when you merge graphs that have different semantics with the same URI Reference, but sometimes the types of graphs merged are small and managable.

(2) Yes, and no. I've come to the conclusion that the only way to understand the semantics of anything is to ask the author (i.e. human documentation). There is no way to do so via computers. This is the same since the dawn of internet time (from the RFC specs to XHTML to the Atom Publication Format).

That's one reason nobody likes DTDs, RELAX NG, XML Schema, OWL (sometimes), and others to specify semantics. You can't do so completely for most non-trivial applications, and all those validation technologies are only hints. That's also why everyone loves XML Schema Datatypes: those elements specify semantics rather than provide a framework for specifying semantics.

(3) Just because some people get confused doesn't mean that others don't. I understand the differences, as to the people I explain them to. Should we throw away calculus because some people don't understand it?

(4) See (1).

I used to be really bugged by those problems... but I think I've found enlightenment. The best way to write semantic web software is to assume, like Socrates and Decartes, that "To know that you do not know is true wisdom". I.E. Assume the semantics of nothing in any context and look it up or ask the URI owner.

Re: URIs make metadata complicated

Laurent Szyster — Tue, 20 Sep 2005 14:50:42 -0000

"Any others?"

Yes:

http://laurentszyster.be/bl...

of course ;-)

Public Names provide a data model that:

1. Captures simple text articulation as unique
sets of strings in a single semantic field,
for instance (with CRLF added):

17:
6:Public,
5:Names,
,
15:
4:data,
5:model,
,
1:a,
7:provide,
4:that,

2. Allow a simple computer system to validate
a string of bytes as an *unambiguous* text
articulation, for instance:

5:Dawes,4:Phil,

and use them as Unique Resource Identifier
with the required properties for a semantic
application.

Kind Regards,