Roskilde Festival 2010 Schedule as XML

@mortenjust and @claus have created the excellent Roskilde Festival Pocket Schedule Generator. They gave me access to their schedule data, and I’ve used that to scrape more tidbits from the Roskilde Festival website. Fields are:

  • Name (all caps)
  • Stage where band plays
  • Time of performance (in UNIX and regular datetime)
  • Roskilde Festival website URL
  • Countrycode
  • Myspace URL
  • Band website URL
  • Picture URL
  • Video-embed-html
  • Tag-line

Get it here: roskilde2010.xml.zip

Google sampled my voice and all I got was this lousy T-shirt!

I’ve just submitted a voice-sample to help Google in their efforts to build Danish-language voice search. See what voice search is about in this video. In case anyone is interested, here’s how Google goes about collecting these samples.
The sampling was carried out by a Danish-speaker hired by Google for the specific task. The sampling was done in a crowded Copenhagen coffee-shop (Baresso at Strøget, near Googles Copenhagen sales office) with people talking, coffee-machines hissing and music in the background. This is likely to ensure that samples are collected in an environment similar to the one where voice search will be used.

The samples were recorded on a stock Google Nexus One using an Android application called “dataHound”. The sampling basically involved me reading 500 random terms, presumably search terms harvested from Google searches. Most were one-word phrases but there some multi-word ones too (this likely reflects the fact that most users only search using single words). The Googler said that it was due to the sensitive nature of these terms (and the risk of harvesting presumably) that the sampling had to be carried out in-person. Google apparently requires 500 of these 500-word samples to form a language-corpus (I was number 50).

The dataHound app displayed the term to be spoken at the top with a bunch of buttons at the bottom. One button advanced the app to the next term, one could be pressed if the term was completely unintelligible and one could be used if the term was offensive to you and you did not want to say it out loud (I had no such qualms). The interface was pretty rough but the app was fast.

The terms were all over the place. I work for Ekstra Bladet (a Danish tabloid) and noted our name cropped up twice. “Nationen” (our debate sub-site) showed up once. Other Danish media sites were well represented and there were many locality-relevant searches. There were also a lot of domain-names, presumably Google expect people to use Google Voice Search over typing in a url themselves (indeed, people already do this on google.com).

Among the terms were also “Fisse” (the Danish word for “cunt”), “tisse kone” (a more polite synonym for female genitals), “ak-47” and “aluha ackbar”. If Google prompts you to say “cunt” in a public place, how can you refuse?

The googler told me that she’s looking for more volunteers, so drop her a line of you speak Danish and live in Copenhagen: duritah@google.com. Plus, you get a Google T-shirt for your efforts!

Book: State of the eUnion

Last fall, I wrote a chapter for a book titled “State of the eUnion”. My chapter is called “Democracy 2.0” and is about how sites like Folkets Ting, OpenCongress and TheyWorkForYou get built and what features should go into them. The other chapters are about the challenges and possibilities of governments and the Internet in general. They are written by people like Tim O’Reilly, Lawrence Lessig and David Weinberger — very humbling company. You can download a pdf or buy a copy on Amazon.

Post with videos of me saying words

The whole Folkets Ting business has turned out rather well (even though the site is not currently updated — we’re working on it!) and I’ve been invited to speak on a few occations. Some of the talks were recorded, and in the interest of self-agrandissement they are included below in chronological order (except for the last one).

Short interview at Halvandet, the day before Reboot11 started:

Talk at Reboot11:

Talk (in Danish) at HeadStart morning inspiration-session in Århus:

Short blurb (in Danish) on what I think about the usefulness of public data at the ODIS conference:

Speech on “Political Data API” after a project of mine won a competition promoting reuse of public data (winners were announced at the conference mentioned above):

You can watch the same video with slides here

And finally, a non Folkets Ting video where I talk about TEDBot, recorded at the “Berlin in October” un-conference:

Famous Danish Programmers

Denmark somehow seems to have hatched more programmers and language designers of note than one would expect of a country of 6 million. Since almost none of them live in Denmark, it is kind of easy to forget. Here’s a partial list (alphabetical, inclusion determined by my completely whimsical notions of famousness, reasons for inclusion may be somewhat exaggerated):

Folkets Ting beta launched

I’ve created a new web site on Danish politics in the tradition of The Public Whip and OpenCongress (although it’s not yet nearly as good as those guys). It’s called Folkets Ting and comes with a complimentary blog (both in Danish). Go check it out.

Randoom on the move

Right – after a few years on ITU servers, I’ve moved my blog to a separate domain hosted by Netplads. This was mostly for SEO reasons, so that I could build Google Juice on my own and not have my page rank muddled with whatever ITU does. The new host also allows .htaccess modifications so that I can get nice URLs. Netplads is a cheap and cheerful Danish hoster – the only fault I’ve found so far is a lack of mod_gzip support.

The blog theme has been modified quite a bit, but is still based on the venerable depo-clean theme by Derek Powazek. It has been cleaned up some more and now supports tags (as opposed to just categories). The theme relies on Smart Archives Reloaded to build the archives and features a ShareThis button. If you want, you can download my version of the theme.

On my old blog, the Redirection plugin does 301 redirects to the one you’re currently reading (doing rewrites in .htaccess would have been easier but was unsupported). In fact, it’s so good at it that I can no longer access my old blog in any way. Good riddance.

The other plugins enabled are:

… and with that, I’m off to Hong Kong.

First Post

Welcome to my blog!

After several false starts, I think I will now have enough material to post regularly. The posts will probably concern mainly LINQ (the subject of my master thesis), Dynamics CRM(which I work with daily) and C#/.Net/Web-tech in general — for the near future at least.

While Hemingway’s prose will consistently make the hairs on the back of my neck stand on end, that is — in fact — not the reason I chose the hemingway reloaded wp-theme. I just happen to think it’s aesthetically pleasing. I’ve made a few minor mods, including removing the credits in the lover left corner. Instead I’ll credit the creators here: Thank you startup365 and Kyle Neath for a beautiful theme. If I find the time, I may mod it some more. I’m thinking …CGA!

The blog is hosted at ITU, it’s free, has an agreeable LAMP-stack and plenty of bandwidth (not that I’ll need it).
If you want to know more about me, check the about page.

UPDATE, 04-08-2007: Google Code Prettify is now syntax highlighting code in posts.