Google sampled my voice and all I got was this lousy T-shirt!

I’ve just submitted a voice-sample to help Google in their efforts to build Danish-language voice search. See what voice search is about in this video. In case anyone is interested, here’s how Google goes about collecting these samples.
The sampling was carried out by a Danish-speaker hired by Google for the specific task. The sampling was done in a crowded Copenhagen coffee-shop (Baresso at Strøget, near Googles Copenhagen sales office) with people talking, coffee-machines hissing and music in the background. This is likely to ensure that samples are collected in an environment similar to the one where voice search will be used.

The samples were recorded on a stock Google Nexus One using an Android application called “dataHound”. The sampling basically involved me reading 500 random terms, presumably search terms harvested from Google searches. Most were one-word phrases but there some multi-word ones too (this likely reflects the fact that most users only search using single words). The Googler said that it was due to the sensitive nature of these terms (and the risk of harvesting presumably) that the sampling had to be carried out in-person. Google apparently requires 500 of these 500-word samples to form a language-corpus (I was number 50).

The dataHound app displayed the term to be spoken at the top with a bunch of buttons at the bottom. One button advanced the app to the next term, one could be pressed if the term was completely unintelligible and one could be used if the term was offensive to you and you did not want to say it out loud (I had no such qualms). The interface was pretty rough but the app was fast.

The terms were all over the place. I work for Ekstra Bladet (a Danish tabloid) and noted our name cropped up twice. “Nationen” (our debate sub-site) showed up once. Other Danish media sites were well represented and there were many locality-relevant searches. There were also a lot of domain-names, presumably Google expect people to use Google Voice Search over typing in a url themselves (indeed, people already do this on google.com).

Among the terms were also “Fisse” (the Danish word for “cunt”), “tisse kone” (a more polite synonym for female genitals), “ak-47” and “aluha ackbar”. If Google prompts you to say “cunt” in a public place, how can you refuse?

The googler told me that she’s looking for more volunteers, so drop her a line of you speak Danish and live in Copenhagen: duritah@google.com. Plus, you get a Google T-shirt for your efforts!

Screen scraping flight data from Amadeus checkmytrip.com

checkmytrip.com let’s you input an airplane flight booking reference and your surname in return for a flight itinerary. This is useful for building all sorts of services to travellers. Unfortunately Amadeus doesn’t have an API, nor are their url’s restful. Using Python, mechanize, htm5lib and BeautifulSoup, you can get at the data pretty easy though.

It is somewhat unclear whether Amadeus approve of people scraping their site, related debate here (check the comments).

I’m not a very good Python programmer (yet!) and the script below could probably be improved quite a lot:

import re
import mechanize
import html5lib
from BeautifulSoup import BeautifulSoup

br = mechanize.Browser()
re1 = br.open("http://www.checkmytrip.com")
br.select_form(nr=2)
br["REC_LOC"] = "BOOKREF"
br["DIRECT_RETRIEVE_LASTNAME"] = "LASTNAME"
re2 = br.submit()
html = re2.read()
doc = html5lib.parse(html)
soup =  BeautifulSoup(doc.toxml())
flightdivs = soup.findAll('div', { "class" : "divtableFlightConf" } )
for div in flightdivs:
    table = div.table
    daterow = table.findChildren("tr")[2]
    datecell = daterow.findChildren("td")[1].string.lstrip().rstrip()
    maincell = table.findChildren("tr")[3]
    timetable = maincell.table.findChildren("tr")[0].td.table
    times =  timetable.findAll("td", {"class" : "nowrap"})
    dtime = times[0].string.lstrip().rstrip()
    atime = times[1].string.lstrip().rstrip()
    airports = timetable.findAll("input", {"name" : "AIRPORT_CODE"})
    aairport = airports[0]['value'].lstrip().rstrip()
    dairport = airports[1]['value'].lstrip().rstrip()
    flight = table.findAll("td", {"id" : "segAirline_0_0"})[0].string.lstrip().rstrip()
    print '--'    
    print 'date: ' + datecell
    print 'departuretime: ' + dtime
    print 'arrivaltime: ' + atime
    print 'departureairport: ' + dairport    
    print 'arrivalairport: ' + aairport
    print 'flight: ' + flight

ASP.Net MVC Layar layer, ghetto-style

Layar is a really great meta-app for iPhone and Android that lets you see a lot of third-party geo-based augmented reality layers on your phone. A “layar” consists of a JSON webservice that provides Points of Interest to users. There is HttpHandler implementation available for .Net, but the Layar specification is so simple (in the good sense of the word) that I decided to just whip up my own in a MVC controller. Computing distances is pretty akward using LinqtoSQL and SQL Server, I use the DistanceBetween function described here. It is used by the FindNearestEvents stored procedure in the code below.

public class LayarController : Controller
{
    public ActionResult GetPOIs(string lat, string lon, 
        string requestedPoiId, string pageKey)
    {
        var db = new DatabaseDataContext();

        int? page = null;
        if (!string.IsNullOrEmpty(pageKey))
        {
            page = int.Parse(pageKey);
        }

        var eventssp = db.FindNearestEvents(
            float.Parse(lat, NumberStyles.Float, CultureInfo.InvariantCulture),
            float.Parse(lon, NumberStyles.Float, CultureInfo.InvariantCulture),
            20, page ?? 0);

        var events = eventssp.Select(e => new POI()
        {
            lat = e.Lat.Value.ToLayarCoord(),
            lon = e.Lng.Value.ToLayarCoord(),
            distance = e.Distance.Value,
            id = e.PermId,
            title = e.Title,
            line2 = e.BodyText,
            attribution = "Ekstra Bladet Krimikort"
        }).ToList();

        return this.Json(
            new Response
            {
                radius = (int)(events.Max(e => e.distance) * 1000),
                nextPageKey = page != null ? page + 1 : 1,
                morePages = events.Count() == 20,
                hotspots = events,
            }, JsonRequestBehavior.AllowGet
            );
    }
}

public class Response
{
    public string layer { get { return "krimikort"; } }
    public int errorCode { get { return 0; } }
    public string errorString { get { return "ok"; } }
    public IEnumerable hotspots { get; set; }
    public int radius { get; set; }
    public int? nextPageKey { get; set; }
    public bool morePages { get; set; }
}

public class POI
{
    public object[] actions { get { return new object[] { }; } }
    public string attribution { get; set; }
    public float distance { get; set; }
    public int id { get; set; }
    public string imageUrl { get; set; }
    public int lat { get; set; }
    public int lon { get; set; }
    public string line2 { get; set; }
    public string line3 { get; set; }
    public string line4 { get; set; }
    public string title { get; set; }
    public int type { get; set; }
}

public static class Extensions
{
    public static int ToLayarCoord(this double coord)
    {
        return (int)(coord * 1000000);
    }
}

Dynamic Sitemap with ASP.Net MVC (incl. geo)

Here is how I generate sitemaps using the XDocument API and a ContentResult. The entries are events that come out of the EventRepository, please substitute as needed. Note that it would be vastly more elegant to use ActionLinks in some way. Note also that the first entry is a link to a Google Earth KMZ file (more here).

[OutputCache(Duration = 12 * 3600, VaryByParam = "*")]
public ContentResult Sitemap()
{
    string smdatetimeformat = "yyyy-MM-dd";

    var erep = new EventRepository();
    var events = (from e in erep.GetGeocodedEvents()
                    where e.IncidentTime.HasValue
                select new {e.Title, e.PermId, e.IncidentTime}).ToList();

    XNamespace sm = "http://www.sitemaps.org/schemas/sitemap/0.9";
    XNamespace geo = "http://www.google.com/geo/schemas/sitemap/1.0";
            
    XDocument doc = new XDocument(
        new XElement(sm + "urlset",
            new XAttribute("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9"),
            new XAttribute(XNamespace.Xmlns + "geo", 
                "http://www.google.com/geo/schemas/sitemap/1.0"),
            new XElement(sm + "url",
                new XElement(sm + "loc", "http://krimikort.ekstrabladet.dk/gearth.kmz"),
                new XElement(sm + "lastmod", DateTime.Now.ToString(smdatetimeformat)),
                new XElement(sm + "changefreq", "daily"),
                new XElement(sm + "priority", "1.0"),
                new XElement(geo + "geo",
                    new XElement(geo + "format", "kmz")
                )
            )
            ,
            events.Select(e => 
                new XElement(sm + "url",
                    new XElement(sm + "loc", EventExtensions.AbsUrl(e.Title, e.PermId)),
                    new XElement(sm + "lastmod", e.IncidentTime.Value.ToString(smdatetimeformat)),
                    new XElement(sm + "changefreq", "monthly"),
                    new XElement(sm + "priority", "0.5")
                )
            )
        )
    );

    return Content(doc.ToString(), "text/xml");
}

LinqtoCRM obsoleted

Shan McArthur put up a notice that the latest version (4.0.12) of the Microsoft CRM SDK includes Linq querying support. The CRM Team have a couple of blog posts describing the new features. I haven’t tested the new SDK, but I definitely recommend you try it out before using LinqtoCRM and I’ve put a notice to that effect on the LinqtoCRM front page.

It’s a little bit sad that LinqtoCRM probably won’t be used much anymore, but I also think it’s great that Microsoft is now providing what looks to be a solid Linq implementation for Dynamics CRM (especially considering the fact that we haven’t released new versions for more than a year).

Anyway, thanks to everyone who have contributed (esp. Mel Gerats and Petteri Räty) and to all the people who have used LinqtoCRM over the years! Now go get the new SDK and write some queries.

Linq-to-SQL, group-by, subqueries and performance

If you’re using Linq-to-SQL, doing group-by and selecting other columns than those in the grouping-key, performance might suffer. This is because there is no good translation of such queries to SQL and Linq-to-SQL has to resort to doing multiple subqueries. Matt Warren explains here. I experienced this firsthand when grouping a lot of geocoded events by latitude and longitude and selecting a few more columns (EventId and CategoryId in the example below):

from e in db.Events
group e by new { e.Lat, e.Lng } into g
select new
{
    g.Key.Lat,
    g.Key.Lng,
    es = g.Select(_ => new { _.EventId, _.CategoryId })
};

One possible solution is to fetch all events, to a ToList() and do the grouping in-memory.

var foo =
    from e in db.Events
    select new { e.Lat, e.Lng, e.EventId, e.CategoryId };

var bar = from e in foo.ToList()
            group e by new { e.Lat, e.Lng } into g
            select new
            {
                g.Key.Lat,
                g.Key.Lng,
                es = g.Select(_ => new { _.EventId, _.CategoryId })
            };

C# and Google Geocoding Web Service v3

Need to geocode addresses using the v3 Google Geocoding Web Service? There are some good reasons to choose the new v3 edition — most importantly, you don’t need an API key. You could use geocoding.net which — at the time of writing —  has some support for v3. I decided to hack up my own wrapper though, and using Windows Communication Foundation, it turned out to be really simple! Note that if you need more of the attributes returned by the Web Service, you should add them to the DataContract classes.

using System;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Json;
using System.Net;
using System.Web;

.
.
.

private static GeoResponse CallGeoWS(string address)
{
	string url = string.Format(
		"http://maps.google.com/maps/api/geocode/json?address={0}&region=dk&sensor=false",
		HttpUtility.UrlEncode(address)
		);
	var request = (HttpWebRequest)HttpWebRequest.Create(url);
	request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
	request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
	DataContractJsonSerializer serializer = new DataContractJsonSerializer(typeof(GeoResponse));
	var res = (GeoResponse)serializer.ReadObject(request.GetResponse().GetResponseStream());
	return res;
}

[DataContract]
class GeoResponse
{
	[DataMember(Name="status")]
	public string Status { get; set; }
	[DataMember(Name="results")]
	public CResult[] Results { get; set; }

	[DataContract]
	public class CResult
	{
		[DataMember(Name="geometry")]
		public CGeometry Geometry { get; set; }

		[DataContract]
		public class CGeometry
		{
			[DataMember(Name="location")]
			public CLocation Location { get; set; }

			[DataContract]
			public class CLocation
			{
				[DataMember(Name="lat")]
				public double Lat { get; set; }
				[DataMember(Name = "lng")]
				public double Lng { get; set; }
			}
		}
	}
}

If you need to geocode a lot of addresses, you need to manage your request rate. Google will help you throttle requests by returning OVER_QUERY_LIMIT statuses if you are going too fast. I use the method below to manage this. It’s decidedly unelegant, please post a reply if you come up with something better.

private static int sleepinterval = 200;

private static GeoResponse CallWSCount(string address, int badtries)
{
	Thread.Sleep(sleepinterval);
	GeoResponse res;
	try
	{
		res = CallGeoWS(address);
	}
	catch (Exception e)
	{
		Console.WriteLine("Caught exception: " + e);
		res = null;
	}
	if (res == null || res.Status == "OVER_QUERY_LIMIT")
	{
		// we're hitting Google too fast, increase interval
		sleepinterval = Math.Min(sleepinterval + ++badtries * 1000, 60000);

		Console.WriteLine("Interval:" + sleepinterval + "                           \r");
		return CallWSCount(address, badtries);
	}
	else
	{
		// no throttling, go a little bit faster
		if (sleepinterval > 10000)
			sleepinterval = 200;
		else
			sleepinterval = Math.Max(sleepinterval / 2, 50);

		Console.WriteLine("Interval:" + sleepinterval);
		return res;
	}
}

Drawing geojson-based polygons on Google Maps

This post will demonstrate how to draw polygons on Google Maps v3 using geojson-encoded data from GeoDjango. The most common method for displaying polygons on Google Maps seems to be by using KML. Google Maps requires the KML-file to be available on a public website though and that is kind of a bore for debugging. This approach uses only json and the standard maps drawing API. To get the display the polygons, you have to loop over them and do a setMap() with your map.

I’m assuming the browser is getting geojson from from a GeoDjango model object like this: model.area.geojson, but it should work for data from other sources. Note that I’m using the very excellent Underscore.js Javascript library to do really terse functional programming. Also note that the function takes an optional bounds object which gets expanded as polygon points are added.

The code seems to perform very well in modern browsers, even for fairly large and complex polygons. Unfortunately there is no live demo, but the site I’m working on should go up soon.

function createPolygons(areajson, bounds){
  var coords = areajson.coordinates;
  var polygons = _(coords).reduce([], function(memo_n, n) {
    var polygonpaths = _(n).reduce(new google.maps.MVCArray(), function(memo_o, o) {
      var polygoncords = _(o).reduce(new google.maps.MVCArray(), function(memo_p, p) {
        var mylatlng = new google.maps.LatLng(p[1], p[0]);
        if(bounds){
          bounds.extend(mylatlng);
        }
        memo_p.push(mylatlng);
        return memo_p;
      });
      memo_o.push(polygoncords);
      return memo_o;
    });
    var polygon = new google.maps.Polygon({
      paths: polygonpaths,
      strokeColor: "#808080",
      strokeOpacity: 0.8,
      strokeWeight: 2,
      fillColor: "#C0C0C0",
      fillOpacity: 0.35
    });
    memo_n.push(polygon);
    return memo_n;
  });
  return polygons;
}

Creating Narwhal/CommonJS packages

I really like the way Javascript is moving from being this annoying thing you have to deal with when doing web-development to becoming a proper server-side programming languages with standard libraries and fast VMs. Yesterday I cloned the Narwhal git repo and tried to create my own package with a few simple collection types. Narwhal is done by the guys at 280north and conforms to the CommonJS standard, an attempt at a cross-platform standard by the various Javascript platform implementors.

I haven’t quite figured out what goes into creating a proper package, but if you check out and configure Narwhal, you can dump the collection implementation in /lib and the tests in /tests and see it all work. First the collections:

// -- friism Michael Friis

exports.Stack = function() {
  return new stack();
};

exports.Queue = function() {
  return new queue();
};

/**
 * Stack implementation, using native Javascript array
 */

function stack() {
  this.data = [];
};

stack.prototype.pop = function() {
  return this.data.pop();
};

stack.prototype.push = function(o) {
  this.data.push(o);
};

/**
 * Queue implementation. Mechanics lifted from Wikipedia
 * Could be optimised to do fewer slices, see here: 
 * http://safalra.com/web-design/javascript/queues/Queue.js
 */

function queue() {
  this.data = [];
  this.length = 0;
}

queue.prototype.isEmpty = function() {
  return (this.data.length == 0);
};

queue.prototype.enqueue = function(obj) {
  this.data.push(obj);
  this.length = this.data.length;
}

queue.prototype.dequeue = function() {
  var ret = this.data[0];
  this.data.splice(0,1);
  this.length = this.data.length;
  return ret;
}

queue.prototype.peek = function() {
  return this.data[0];
}

queue.prototype.clear = function() {
  this.length = 0;
  this.data = [];
}

… and the tests:

var assert = require("test/assert");

var js5 = require("js5");

exports.testStackSimple = function() {
  var mystack = new js5.Stack();
  var myobject = "1";
  mystack.push(myobject);
  var popped = mystack.pop();
  assert.isEqual(myobject, popped);
};

exports.testStack = function() {
  var mystack = new js5.Stack();
  for(var i = 99; i >= 0; i--) {
    mystack.push(i.toString());
  }

  for(var j = 0; j 

I asked for help on the Narwhal IRC channel and Kris Kowal pointed me to Chiron, a module library he's working on that already contains set and dictionary implementations. I recommend checking out the code, it highligts some of the interesting challenges of implementing collections in Javascript.

Also, this "Javascript, the Good Parts" talk by Doug Crockford (author of book of the same name) is really good:

Book: State of the eUnion

Last fall, I wrote a chapter for a book titled “State of the eUnion”. My chapter is called “Democracy 2.0” and is about how sites like Folkets Ting, OpenCongress and TheyWorkForYou get built and what features should go into them. The other chapters are about the challenges and possibilities of governments and the Internet in general. They are written by people like Tim O’Reilly, Lawrence Lessig and David Weinberger — very humbling company. You can download a pdf or buy a copy on Amazon.