danbri's foaf stories

A quick link roundup:

From ‘Google OAuth & Federated Login Research‘:

“The following provides some guidelines for the user interface define of becoming an OAuth service provider”

Detailed notes on UI issues, with screenshots and links to related work (opensocial etc.).

Myspace’s OAuth Testing tool:

The MySpace OAuth tool creates examples to show external developers the correct format for constructing HTTP requests signed according to OAuth specifications

Google’s OAuth playground tool (link):

… to help developers cure their OAuth woes. You can use the Playground to help debug problems, check your own implementation, or experiment with the Google Data APIs.

If anyone figures out how to post files to Blogger via their AtomPub/OAuth API, please post a writeup! We should be able to use it to post RDFa/FOAF etc hopefully…

Yahoo’s OpenID usability research. Really good to see this made public, I hope others do likewise. There’s a summary page and a full report in PDF, “Yahoo! OpenID: One Key, Many Doors“.

Finally, what looks like an excellent set of introductory posts on OAuth: a Beginner’s Guide to OAuth from Eran Hammer-Lahav.

Dries Buytaert  on ‘Drupal, the Semantic Web, and search’.

Discusses RDFa, SearchMonkey and more. Great stuff! Excerpt:

This kind of technology is not limited to global search. On a social networking site built with Drupal, it opens up the possibility to do all sorts of deep social searches - searching by types and levels of relationships while simultaneously filtering by other criteria. I was talking with David Peterson the other day about this, and if Drupal core supported FOAF and SIOC out of the box, you could search within your network of friends or colleagues. This would be a fundamentally new way to take advantage of your network or significantly increase the relevance of certain searches.

Meanwhile, PHP Shindig (Apache’s OpenSocial widget container) has now been integrated into Drupal. Currently version 5 but things are moving to get it working in 6.x too. See docs (this link has some issues) and the announcement/discussion from shindig-dev. Also great news…

Maybe this is a record for delayed blogging. Nine and a half years late, here’s a writeup I found (on a corpsed hard-drive) of an image metadata workshop held by NISO, in Washington. I wrote it up for the JISC JIDI project at ILRT, who funded my trip. I’m sure they won’t mind it being shared here. Eric Miller and Paul Miller were at the workshop too; I remember working on the old Dublin Core RDF spec with them. NISO’s Workshop Report is also available from their site.

In April 1999, NISO organised a two-day invitational meeting whose aim was to gather requirements for technical metadata for digital images. The meeting addressed both architectural issues, and began the work of defining and categorising actual data elements for representing technical information about images. A report has recently been published on the NISO web service providing preliminary overviews of the agreed elements, including an comprehensive overview [report] of the two day meeting. This report for the JIDI group does not attempt to reproduce this work, but instead reports on a number of issues relating to the aims of JISC-funded services such as JIDI and TASI.

The workshop wrestled with a number of problems familiar to many from the non-imaging metadata community. In particular, it reaffirmed the conclusions drawn at the end of the earlier Dublin Core and Images workshop, namely that the requirements (and challenges) of image-oriented metadata applications were in many respects close to those of the bibliographic and text-oriented areas. Where digital imaging applications had specific requirements, these were in many cases related to the high mobility of digital image objects, and the frequency with which these objects were subjected to tranformations (eg. format conversion, editing).

A major concern which arose many times during the meeting was that of metadata loss through transformation. Many major software tools (eg. Photoshop) destroy or scramble embedded metadata. Conversion and editing of digital image objects tends, with currently available software, to damage embedded metadata. Given the high mobility of digital images, it is nevertheless appealling to explore models whereby metadata can travel with the objects from application to application. The meeting explored a number of frameworks whereby this could be achieved, including the use of a ‘container format‘ which might encapsulate both an image and accompanying metadata. The Java ‘JAR’ archive approach, which combines multiple files plus a metadata ‘manifest’ into a single portable object, was raised as a possible approach. A variant of this model was also discussed which used XML/RDF manifest files within a .ZIP or .JAR container to bundle both image data and metadata into a single transportable object. Although there was some interest in these approaches, no consensus was reached on the appropriate way forward, nor on whether this was a specifically image-related challenge or a general problem for the industry which might benefit from a generalised solution.

There was some consensus that digital signatures would need to be deployed over both image metadata, and the image data itself, for applications (eg. JIDI collections) which require some degree of quality assurance regarding the transformational processes that image collections have undergone. For digital signature technology to be applied to such content a canonicalisation algorithm is necessary. RDF and XML were raised as possible solutions in this area, although it was noted that the Signed XML initiative within W3C was not yet underway, and tha t the RDF working groups had deferred work on canonicalisation of RDF metadata until the issues had been more fully explored by the Signed XML group. Although digital signature technology can be applied to any content for which a text-based representation can be derrived, the issue of canonicalisation. Broadly, this means that applications which use digital signatures to make trust decisions, and which transform images, will ultimately benefit from a higher level of abstraction concerning ‘what it is that has been signed’. Current technology allows applications to think of simple textual files (eg. containing metadata) as being verifiable using digital signatures over that content. Future work will allow applications to instead make trust decisions relating to the logical ‘assertions’ represented in the files. The implication here is that trust-based metadata applications are at an early stage, but that we can anticipate within perhaps 2 years there being greater infrastructural support for applications which reason usefully about embedded metadata, eg. concerning the provenence, intellectual property rights and transformational history of some image object.

Why dig this up? Partly because I think a lot of practical work fed into the RDF design and it’s adoption eg. in Dublin Core, and this history is poorly documented. Also to kick myself for making prediction (” anticipate within perhaps 2 years”, phoey :) and because I’m looking again at signed RDF, largely in a FOAF/SPARQL context. Also I’ve just added a ‘foaf4lib’ category to the blog, as an offering to the code4lib aggregator and to accompany the new foaf4lib mailing list we have in the FOAF project.

Keeping my notes findable: Rmutt on MacOSX will build cleanly if you change the CC line in the Makefile to be:

CC = gcc -arch x86_64

That is all. For those who haven’t tried it, Rmutt is a grammar-based nonsense generator along the lines of the Dada Engine (see also Web version). The Dada Engine powers the near-legendary Postmodernism Generator, though I’m lately thinking about putting these kinds of tools into a hosted (openid/wiki etc) environment where people could collaboratively build up shared grammars.

Just stumbled across this, after meeting Karen Coyle here at DC2008. A nice account of why we might care about linking information (and linking data):

Much like people, the real meaningfulness of information is how it interacts with others of its kind. Information that is alone or out of context is inert and cannot reach its potential. Usability is the key to information value, but usability can rarely be applied to any individual information unit. When libraries buy or gather bits of information in the form of books or journals or web sites, they do so with the express goal of making all of the information in their collection usable in the context of the library. A modern scientific treatise should be bolstered by the classical thinking that made it possible. Works promoting one political or moral point of view should sit on the shelf beside those promoting the opposite view. There are millions of invisible hyperlinks between these works that can be discovered by alert readers following Myst-like clues buried deep in the texts.

Meanwhile, Cat’s Cradle and Bokononism are now on my reading list…

A Pew Research Center survey released a few days ago found that only half of Americans correctly know that Mr. Obama is a Christian. Meanwhile, 13 percent of registered voters say that he is a Muslim, compared with 12 percent in June and 10 percent in March.

More ominously, a rising share — now 16 percent — say they aren’t sure about his religion because they’ve heard “different things” about it.

When I’ve traveled around the country, particularly to my childhood home in rural Oregon, I’ve been struck by the number of people who ask something like: That Obama — is he really a Christian? Isn’t he a Muslim or something? Didn’t he take his oath of office on the Koran?

It was in the NYTimes, so it must be true. Will the last one to leave the Web please turn off the lights.

The are some interesting things going on at Mozilla Labs. Yesterday, Ubiquity was all over the mailing lists. You can think of it as “what the Humanized folks did next”, or as a commandline for the Web, or as a Webbier sibling to QuickSilver, the MacOSX utility. I prefer to think of it as the Mozilla add-on that distracted me all day. Ubiquity continues Mozilla’s exploration of the potential UI uses of its “awesome bar” (aka Location bar). Ubiquity is invoked on my Mac with alt-space, at which point it’ll enthusiastically try to autocomplete a verb-centric Webby task from whatever I type. It does this by consulting a pile of built-in and community-provided Javacript functions, which have access to the Web, your browser (hello, widget security fans)… and it also has access to UI, in terms of an overlaid preview window, as well as a context menu that can actually be genuinely contextual, ie. potentially sensitive to microformat and RDFa markup.

So it might help to think of ubiquity as a cross between The Hobbit, GreaseMonkeyBookmarklets, and Mozilla’s earlier forms of packaged addon. Ok, well it’s not very Hobbit, I just wanted an excuse for this screen grab. But it is about natural language interfaces to complex Webby datasources and services.

The basic idea here is that commands (triggered by some keyword) can be published in the Web as links to simple Javascript files that can be single-click added (without need for browser restart) by anyone trusting enough to add the code to their browser. Social/trust layers to help people avoid bad addons are in the works too.

I spent yesterday playing. There are some rough edges, but this is fun stuff for sure. The emphasis is on verbs, hence on doing, rather than solely on lookups, query and data access. Coupled with the dependency on third party Javascript, this is going to need some serious security attention. But but but… it’s so much fun to use and develop for. Something will shake out security-wise. Even if Ubiquity commands are only shared amongst trusting power users who have signed each other’s PGP keys, I think it’ll still have an important niche.

What did I make? A kind of stalk-a-tron, FOAF lookup tool. It currently only consults Google’s Social Graph API, an experimental service built from all the public FOAF and XFN on the Web plus some logic to figure out which account pages are held by the same person. My current demo simply retrieves associated URLs and photos, and displays them overlaid on the current page. If you can’t get it working via the Ubiquity auto-subscribe feature, try adding it by pasting the raw Javascript into the command-editor screen. See also the ‘sindice-term‘ lookup tool from Michael Hausenblas. It should be fun seeing how efforts like Bengee’s SPARQLScript work can be plugged in here, too.

The ultimate absurdity is now staring us in the face: a universal library of two volumes, one containing a single dot and the other a dash. Persistent repetition and alternation of the two is sufficient, we well know, for spelling out any and every truth. The miracle of the finite but universal library is a mere inflation of the miracle of binary notation: everything worth saying, and everything else as well, can be said with two characters. It is a letdown befitting the Wizard of Oz, but it has been a boon to computers.

– Willard van Orman Quine on the Universal Library

(via Borges’ Library of Babel indirectly found via Dan Connolly’s RDFization of the animals quote)

This somehow reminded me of a couple other links I found earlier on Turing Machines built in Conway’s game of Life: one from Paul Rendell, another from Paul Chapman. These machines also have a kind of strange beauty

The laconi.ca microblogging platform is as open as you could hope for. That elusive trinity: open source; open standards; and open content.

The project is led by Evan Prodromou (evan) of Wikitravel fame, whose company just launched identi.ca, “an open microblogging service” built with Laconica. These are fast gaining feature-parity with twitter; yesterday we got a “replies” tab; this morning I woke to find “search” working. Plenty of interesting people have  signed up and grabbed usernames. Twitter-compatible tools are emerging.

At first glance this might look the typical “clone” efforts that spring up whenever a much-loved site gets overloaded. Identi.ca’s success is certainly related to the scaling problems at Twitter, but it’s much more important than that. Looking at FriendFeed comments about identi.ca has sometimes been a little depressing: there is too often a jaded, selfish “why is this worth my attention?” tone. But they’re missing something. Dave Winer wrote a “how to think about identi.ca” post recently; worth a read, as is the ever-wise Edd Dumbill on “Why identica is important”. This project deserves your attention if you value Twitter, or if you care about a standards-based decentralised Social Web.

I have a testbed copy at foaf2foaf.org (I’ve been collecting notes for Laconica installations at Dreamhost). It is also federated. While there is support for XMPP (an IM interface) the main federation mechanism is based on HTTP and OAuth, using the openmicroblogging.org spec. Laconica supports OpenID so you can play  without needing another password. But the OpenID usage can also help with federation and account matching across the network.

Laconica (and the identi.ca install) support FOAF by providing a FOAF files  - data that is being indexed already by Google’s Social Graph API. For eg. see  my identi.ca FOAF; and a search of Google SGAPI for my identi.ca account.  It is in PHP (and MySQL) - hacking on FOAF consumer code using ARC is a natural step. If anyone is interested to help with that, talk to me and to Evan (and to Bengee of course).

Laconica encourages everyone to apply a clear license to their microblogged posts; the initial install suggests Creative Commons Attribution 3. Other options will be added. This is important, both to ensure the integrity of this a system where posts can be reliably federated, but also as part of a general drift towards the opening up of the Web.

Imagine you are, for example, a major media content owner, with tens of thousands of audio, video, or document files. You want to know what the public are saying about your stuff, in all these scattered distributed Social Web systems. That is just about do-able. But then you want to know what you can do with these aggregated comments. Can you include them on your site? Horrible problem! Who really wrote them? What rights have they granted? The OpenID/CC combination suggests a path by which comments can find their way back to the original publishers of the content being discussed.

I’ve been posting a fair bit lately about OAuth, which I suspect may be even more important than OpenID over the next couple of years. OAuth is an under-appreciated technology piece, so I’m glad to see it being used nicely for Laconica. Laconica installations allow you to subscribe to an account from another account elsewhere in the Web. For example, if I am logged into my testbed site at http://foaf2foaf.org/bandri and I visit http://identi.ca/libby, I’ll get an option to (remote-)subscribe. There are bugs and usability problems as of right now, but the approach makes sense: by providing the url of the remote account, identi.ca can bounce me over to foaf2foaf which will ask “really want to subscribe to Libby? [y/n]“, setting up API permissioning for cross-site data flow behind the scenes.

I doubt that the openmicroblogging spec will be the last word on this kind of syndication / federation. But it is progress, practical and moving fast. A close cousin of this design is the work from the SMOB (Semantic Microblogging) project, who use SIOC, FOAF and HTTP. I’m happy to see a conversation already underway about bridging those systems.

Do please consider supporting the project. And a special note for Semantic Web (over)enthusiasts: don’t just show up and demand new RDF-related features. Either build them yourself or dive into the project as a whole. Have a nose around the buglist. There is of course plenty of scope for semwebbery, but I suggest a first priority ought to be to help the project reach a point of general usability and adoption. I’ve nothing against Twitter just as I had nothing at all against Six Apart and Movable Type, back before they opensourced. On the contrary, Movable Type was a great product from great people. But the freedoms and flexibility that opensource buys us are hard to ignore. And so I use Wordpress now, having migrated like countless others. My suspicion is we’re at a “Wordpress/MovableType” moment here with Identica/Laconica and Twitter, and that of all the platforms jostling to be the “new twitter”, this one is most deserving of success. With opensource, Laconica can be the new Laconica…

You can follow me here identi.ca/danbri

(a clock showing no time)

From my Skype logs [2008-06-19 Dan Brickley: 18:24:23]

So I had a drunken dream about online microcurrencies last night. Also about cats and water-slides but that’s another story. Idea was of a karma donation system based on one-off assignments from person to person of specified chunks of their lifetime; ‘giving the time of day’. you’re allowed to give any time of day taken from those days you’ve been alive so far. they’re not directly redistributable, nor necessarily related to what happened during the specified time. there’s no central banker, beyond the notion of ‘the public record’. The system naturally favours the old/experienced, but if someone gets drunk and gives all their time/karma to a porn site, at least in the morning they’ll have another 24h ‘in the bank’. Or they could retract/deny the gift, although doing so a lot would also be visible in the public record and doing so excessively would make one look a bit sketchy, one’s time gifts seem less valuable etc. Anchoring to a real world ‘good’ (time) is supposed to provide some control against runaway inflation, as is non-redistributability, but also the time thing is nice for visualizations and explanation. I’m not really sure if it makes sense but thought i’d write it down before i forget the idea…

One idea would be for the time gifts to be redeemable, but that i think pushes the metaphor too far into being a real currency for a fictional world where hourly rates are flattened. Some Lets schemes probably work that way I guess…

So I’ve been meaning to write this up, but in the absense of having done so, here’s the idea as it first struck me. I had been thinking a bit about online reputation services, and the kinds of information they might aggregate. Garlik’s QDOS and FOAF experiments being a good example of this kind of evidence aggregation. As OpenID, FOAF, microformats etc. take hold, I really think we’ll see a massive parting of waves, red sea style, with the “public record” on one side, and “private stuff” on the other.

And in the public record, we’ll be attaching information about the things we make and do to well-known identifiers for people (and their semi-detached aliases). Various websites have rating and karma mechanisms, but it is far from clear how they’ll look when shared in the public Web. Nor whether something robust and not-too-gameable will come out of it. There are certainly various modelling idioms (eg. advogato do their internal calculations, and then put everyone in one of several broad-brush groups; here’s my advogato FOAF). See also my previous notes on representing expertise.

Now in some IRC channels, there are bots where you can dish out credit by typing things like:

edd++ # xtechy

…and have a bot add up the credits, as well as the comments. In small IRC communities these aren’t gamed except for fun. So I’ve been thinking: how can these kinds of habits ever work in the wider Web, where people are spread across Web sites (but nevetherless identifiable with OpenID and FOAF). How could it not turn hideous? What limited resource do we each have a supply of? No, not kidneys. And in a hungover stupor I came to think that “the time of day” could be such a resource. It’s really just a metaphor, and I’m not sure at all that the quantifiable nature is a benefit. But I also quite like that we each have a neverending supply of the stuff, and that even a fleeting moment can count.

Update: here’s a post from Simon Lucy which has a very similar direction (it was Simon I was drinking with the night before writing this). Excerpt:

And what do you do with your positive balance? Need you do anything? I imagine those that care will publish their balance or compare it with others in similar way to company cars or hi fi tvs. There will always be envy and jealousy.

But no one can steal your balance, misuse it.

So who wants to host the Ego Bank.

The main difference compared to my suggested scheme, is just the ‘the Web’ and the public record it carries, are the “ego bank”, creating a playground for aggregators of karma, credibility and reputation information. “The time of day” would just be one such category of information…

A brief update on the YouTube/Viacom privacy disaster.

From Ellen Nakashima in the Washington Post:

Yesterday, lawyers for Google said they would not appeal the ruling. They sent Viacom a letter requesting that the company allow YouTube to redact user names and IP addresses from the data.

“We are pleased the court put some limits on discovery, including refusing to allow Viacom to access users’ private videos and our search technology,” Google senior litigation counsel Catherine Lacavera said in a statement. “We are disappointed the court granted Viacom’s overreaching demand for viewing history. We will ask Viacom to respect users’ privacy and allow us to anonymize the logs before producing them under the court’s order.”

I’m pleased to read that Google are trying to keep identifying information out of this (vast) dataset.

Viacom claim to want this data to “measure the popularity of copyrighted video against non-copyrighted video” (in the words of the Washington Post article; I don’t have a direct quote handy).

If that is the case, I suggest their needs could be met with a practical compromise. Google should make a public domain data dump summarising the (already public) favouriting history of each video (with or without reference to users, whose identifiers could be scrambled/obscured). This addresses directly the Viacom demand while sticking to the principle of relying on the public record to answer Viacom’s query. Only if the public record is incapable of answering Viacom’s (seemingly reasonable) request should users private behaviour logs be even considered. Google should also make use of their own Social Graph API to determine how many YouTube usernames are already associated in the public Web with other potentially identifying profile information; those usernames at least should not be handed over without at least some obfuscation.

If we know which YouTube videos are copyrighted (and Viacom owned). And we know how long they’ve been online, and which ones have been publicly flagged as ‘favourites’ by YouTube users, we have a massively rich dataset. I’d like to see that avenue of enquiry thoroughly exhausted before this goes any further.

Nearby in the Web: Danny Weitzner has blogged further thoughts on all this, including a pointer to a recent paper on information accountability, suggesting a possible shift of emphasis from who can access information, to the acceptable uses to which it may be put.

From Yaron Koren on the semediawiki-users list:

I’m pleased to announce the release of the site Referata, at referata.com: a hosting site for SMW-based semantic wikis. This is not the first site to offer hosting of wikis using Semantic MediaWiki (that’s Wikia, as of a few months ago), but it is the first to also offer the usage of Semantic Forms, Semantic Drilldown, Semantic Calendar, Semantic Google Maps and some of the other related extensions you’ve probably heard about; Widgets, Header Tabs, etc. As such, I consider it the first site that lets people create true collaborative databases, where many people can work together on a set of well-structured data.

See announcement and their features page for more details. Basic usage is free; $20/month premium accounts can have private data, and $250/month enterprise accounts can use their own domains. Not a bad plan I think. A showcase Referata wiki would help people understand the offering better. In the meantime there is elsewhere a list of sites using Semantic MediaWiki. That list omits Chickipedia; we can only wonder why. Also I have my suspicions that Intellipedia runs with the SMW extensions too, but that’s just guessing. Regardless, there are a lot of fun things you could do with this, take a look…

From Wired via Thomas Roessler:

Google will have to turn over every record of every video watched by YouTube users, including users’ names and IP addresses, to Viacom, which is suing Google for allowing clips of its copyright videos to appear on YouTube, a judge ruled Wednesday.

I hope nobody thought their behaviour on youtube.com was a private matter between them and Google.

The Judge’s ruling (pdf) is interesting to read (ok, to skim). As the Wired article says,

The judge also turned Google’s own defense of its data retention policies — that IP addresses of computers aren’t personally revealing in and of themselves, against it to justify the log dump.

Here’s an excerpt. Note that there is also a claim that youtube account IDs aren’t personally identifying.

Defendants argue that the data should not be disclosed because of the users’ privacy concerns, saying that “Plaintiffs would likely be able to determine the viewing and video uploading habits of YouTube’s users based on the user’s login ID and the user’s IP address” .

But defendants cite no authority barring them from disclosing such information in civil discovery proceedings, and their privacy concerns are speculative.  Defendants do not refute that the “login ID is an anonymous pseudonym that users create for themselves when they sign up with YouTube” which without more “cannot identify specific individuals”, and Google has elsewhere stated:

“We . . . are strong supporters of the idea that data protection laws should apply to any data  that could identify you.  The reality is though that in most cases, an IP address without additional information cannot.” — Google Software Engineer Alma Whitten, Are IP addresses personal?, GOOGLE PUBLIC POLICY BLOG (Feb. 22, 2008)

So forget the IP address part for now.

Since early this year, Google have been operating an experimental service called the Social Graph API. From their own introduction to the technology:

With so many websites to join, users must decide where to invest significant time in adding their same connections over and over. For developers, this means it is difficult to build successful web applications that hinge upon a critical mass of users for content and interaction. With the Social Graph API, developers can now utilize public connections their users have already created in other web services. It makes information about public connections between people easily available and useful.

Only public data. The API returns web addresses of public pages and publicly declared connections between them. The API cannot access non-public information, such as private profile pages or websites accessible to a limited group of friends.

Google’s Social Graph API makes easier something that was already possible: using XFN and FOAF markup from the public Web to associate more personal information with YouTube accounts. This makes information that was already public increasingly accessible to automated processing. If I choose to link to my YouTube profile with the XFN markup rel=’me’ from another of my profiles,  those 8 characters are sufficient to bridge my allegedly anonymous YouTube ID with arbitrary other personal information. This is done in a machine-readable manner, one that Google has already demonstrated a planet-wide index for.

Here is the data returned by Google’s Social Graph API when asking for everything about my YouTube URL:

{
 “canonical_mapping”: {
  “http://youtube.com/user/modanbri”: “http://youtube.com/user/modanbri”
 },
 “nodes”: {
  “http://youtube.com/user/modanbri”: {
   “attributes”: {
    “url”: “http://youtube.com/user/modanbri”,
    “profile”: “http://youtube.com/user/modanbri”,
    “rss”: “http://youtube.com/rss/user/modanbri/videos.rss”
   },
   “claimed_nodes”: [
   ],
   “unverified_claiming_nodes”: [
    "http://friendfeed.com/danbri",
    "http://www.mybloglog.com/buzz/members/danbri"
   ],
   “nodes_referenced”: {
   },
   “nodes_referenced_by”: {
    “http://friendfeed.com/danbri”: {
     “types”: [
      "me"
     ]
    },
    “http://guttertec.swurl.com/friends”: {
     “types”: [
      "friend"
     ]
    },
    “http://www.mybloglog.com/buzz/members/danbri”: {
     “types”: [
      "me"
     ]
    }
   }
  }
 }
}

You can see here that the SGAPI, built on top of Google’s Web crawl of public pages, has picked out the connection to my FriendFeed (see FOAF file) and MyBlogLog (see FOAF file) accounts, both of whom export XFN and FOAF descriptions of my relationship to this YouTube account, linking it up with various other sites and profiles I’m publicly associated with.

YouTube users who have linked their YouTube account URLs from other social Web sites (something sites like FriendFeed and MyBlogLog actively encourage), are no longer anonymous on YouTube. This is their choice. It can give them a mechanism for sharing ‘favourited’ videos with a wide circle of friends, without those friends needing logins on YouTube or other Google services. This clearly has business value for YouTube and similar ’social video’ services, as well as for users and Social Web aggregators.

Given such a trend towards increased cross-site profile linkage, it is unfortunate to read that YouTube identifiers are being presented as essentially anonymous IDs: this is clearly not the case. If you know my YouTube ID ‘modanbri’ you can quite easily find out a lot more about me, and certainly enough to find out with strong probability my real world identity. As I say, this is my conscious choice as a YouTube user; had I wanted to be (more) anonymous, I would have behaved differently. To understand YouTube IDs as being anonymous accounts is to radically misunderstand the nature of the modern Web.

Although it wouldn’t protect against all analysis, I hope the user IDs are at least scrambled before being handed over to Viacom. This would make it harder for them to be used to look up other data via (amongst other things) Google’s own YouTube and Social Graph APIs.

Update: I should note also that the bridging of YouTube IDs with other profiles is one that is not solely under the control of the YouTube user. Friends, contacts, followers and fans on other sites can link to YouTube profiles freely; this can be enough to compromise an otherwise anonymous account. Increasingly, these links are machine-processable; a trend I’ve previously argued is (for better or worse) inevitable.

Furthermore, the hypertext and data environment around YouTube and the Social Web is rapidly evolving; the lookups and associations we’ll be able to make in 1-2 years will outstrip what is possible today. It only takes a single hyperlink to reveal the owner of a YouTube account name; many such links will be created in the months to come.

via Sumit Kataria on the oauth list:

I am very happy to announce that Drupal’s OAuth module is now ready to use. Right now it just acts as server because we don’t need client support at this time, Client implementation will be done soon by the time release of Drupal 7 as ServicesAPI in Drupal 7 will be needing it. Endpoints for Drupal are:

REQUEST URL : http://www.example.com/?q=oauth/request
AUTH URL : http://www.example.com/?q=oauth/auth
ACCESS : http://www.example.com/?q=oauth/access

At this time Services API is necessary to use OAuth (also it is the only module which is gonna use oauth as well - so not a big deal). This module also provides a test browser to produce OAuth tokens (whole Drupal way using multi-page form). Right now only PLAIN-TEXT signature method is supported but soon support for other methods will be added as well.

I’m slowly learning my way around Drupal, so this is rather good encouragement to learn faster…

See the announcement for more details and links.

A genetic theory of homosexuality.

The article reports on recent work (pdf) addressing the ‘if homosexuality is genetic, why hasn’t it died out?’ debate, which suggests that the ‘gene for male homosexuality persists because it promotes—and is passed down through—high rates of procreation among gay men’s mothers, sisters, and aunts‘.

In other words, gayness in men can be as natural and as the male nipple, even if both are initially puzzling when thought of in evolutionary terms. OK I’m stretching things slightly, but I can’t help but wonder whether the nipple analogy might be a good basis for informal arguments for a bit more tolerance:

Let He Who is Without Nipples Cast the First Stone?

Or maybe not.

More on the Great Nipple Question from straightdope.com; a moment of science; and the evolution-101 blog.

Echoing recent discussion of Semantic Web “Killer Apps”, an “are Topic Maps dead?” thread on the topicmaps mailing list. Signs of life offered include www.fuzzzy.com (’Collaborative, semantic and democratic social bookmarking’, Topic Maps meet social networking; featured tag: ‘topic maps‘) and a longer-list from Are Gulbrandsen who suggests a predictable http://en.wikipedia.org/wiki/Hype_cycle ">hype-cycle dropoff is occuring, as well as a migration of discussions from email into the blog world. For which, see the topicmaps planet aggregator, and through which I indirectly find Steve Pepper’s blog and an interesting post on how TMs relate to RDF, OWL and the Semantic Web (though I’d have hoped for some mention of SKOS too).

Are Gulbrandsen also cites NZETC (the New Zealand Electronic Tech Centre), winner of The Topic Maps Application of the year award at the Topic Maps 2008 conference; see Conal Tuohy’s presentation on Topic Maps for Cultural Heritage Collections (slides in PDF). On NZETC’s work: “It may not look that interesting to many people used to flashy web 2.0 sites, but to anybody who have been looking at library systems it’s a paradigm shift“.

Other Topic Map work highlighted: RAMline (Royal Academy of Music rewriting musical history). “A long-term research project into the mapping of three axes of musical time: the historical, the functional, and musical time itself.”; David Weinberger blogged about this work recently. Also MIPS / Institute for Bioinformatics and Systems Biology who “attempt to explain the complexity of life with Topic Maps” (see presentation from Volker Stümpflen (PDF); also a TMRA’07 talk).

Finally, pointers to opensource developer tools: Ruby Topic Maps and Wandora (Java/GPL), an extraction/mapping and publishing system which amongst other things can import RDF.

Topic Maps are clearly not dead, and the Web’s a richer environment because of this. They may not have set the world on fire but people are finding value in the specs and tools, while also exploring interop with RDF and other related technologies. My hunch is that we’ll continue to see a slow drift towards the use of RDF/OWL plus SKOS for apps that might otherwise have been addressed using TopicMaps, and a continued pragmatism from tool and app developers who see all these things as ways to solve problems, rather than as ends in themselves.

Just as with RDFa, GRDDL and Microformats, it is good and healthy for the Web community to be exploring multiple similar strands of activity. We’re smart enough to be able to flow data across these divides when needed, and having only a single technology stack is I think both intellectually limiting, socially impractical, and technologically short-sighted.

Allegedly. Powerset are natural language processing specialists. See also last year’s ISWC talk from CTO Barney Pell, “Natural Language and the Semantic Web”, discussions with Barney from last month’s Talis Semantic Web Gang chat, and earlier commentary from Paul Miller.