Screen scraping flight data from Amadeus checkmytrip.com

checkmytrip.com let’s you input an airplane flight booking reference and your surname in return for a flight itinerary. This is useful for building all sorts of services to travellers. Unfortunately Amadeus doesn’t have an API, nor are their url’s restful. Using Python, mechanize, htm5lib and BeautifulSoup, you can get at the data pretty easy though.

It is somewhat unclear whether Amadeus approve of people scraping their site, related debate here (check the comments).

I’m not a very good Python programmer (yet!) and the script below could probably be improved quite a lot:

import re
import mechanize
import html5lib
from BeautifulSoup import BeautifulSoup

br = mechanize.Browser()
re1 = br.open("http://www.checkmytrip.com")
br.select_form(nr=2)
br["REC_LOC"] = "BOOKREF"
br["DIRECT_RETRIEVE_LASTNAME"] = "LASTNAME"
re2 = br.submit()
html = re2.read()
doc = html5lib.parse(html)
soup =  BeautifulSoup(doc.toxml())
flightdivs = soup.findAll('div', { "class" : "divtableFlightConf" } )
for div in flightdivs:
    table = div.table
    daterow = table.findChildren("tr")[2]
    datecell = daterow.findChildren("td")[1].string.lstrip().rstrip()
    maincell = table.findChildren("tr")[3]
    timetable = maincell.table.findChildren("tr")[0].td.table
    times =  timetable.findAll("td", {"class" : "nowrap"})
    dtime = times[0].string.lstrip().rstrip()
    atime = times[1].string.lstrip().rstrip()
    airports = timetable.findAll("input", {"name" : "AIRPORT_CODE"})
    aairport = airports[0]['value'].lstrip().rstrip()
    dairport = airports[1]['value'].lstrip().rstrip()
    flight = table.findAll("td", {"id" : "segAirline_0_0"})[0].string.lstrip().rstrip()
    print '--'    
    print 'date: ' + datecell
    print 'departuretime: ' + dtime
    print 'arrivaltime: ' + atime
    print 'departureairport: ' + dairport    
    print 'arrivalairport: ' + aairport
    print 'flight: ' + flight

ASP.Net MVC Layar layer, ghetto-style

Layar is a really great meta-app for iPhone and Android that lets you see a lot of third-party geo-based augmented reality layers on your phone. A “layar” consists of a JSON webservice that provides Points of Interest to users. There is HttpHandler implementation available for .Net, but the Layar specification is so simple (in the good sense of the word) that I decided to just whip up my own in a MVC controller. Computing distances is pretty akward using LinqtoSQL and SQL Server, I use the DistanceBetween function described here. It is used by the FindNearestEvents stored procedure in the code below.

public class LayarController : Controller
{
    public ActionResult GetPOIs(string lat, string lon, 
        string requestedPoiId, string pageKey)
    {
        var db = new DatabaseDataContext();

        int? page = null;
        if (!string.IsNullOrEmpty(pageKey))
        {
            page = int.Parse(pageKey);
        }

        var eventssp = db.FindNearestEvents(
            float.Parse(lat, NumberStyles.Float, CultureInfo.InvariantCulture),
            float.Parse(lon, NumberStyles.Float, CultureInfo.InvariantCulture),
            20, page ?? 0);

        var events = eventssp.Select(e => new POI()
        {
            lat = e.Lat.Value.ToLayarCoord(),
            lon = e.Lng.Value.ToLayarCoord(),
            distance = e.Distance.Value,
            id = e.PermId,
            title = e.Title,
            line2 = e.BodyText,
            attribution = "Ekstra Bladet Krimikort"
        }).ToList();

        return this.Json(
            new Response
            {
                radius = (int)(events.Max(e => e.distance) * 1000),
                nextPageKey = page != null ? page + 1 : 1,
                morePages = events.Count() == 20,
                hotspots = events,
            }, JsonRequestBehavior.AllowGet
            );
    }
}

public class Response
{
    public string layer { get { return "krimikort"; } }
    public int errorCode { get { return 0; } }
    public string errorString { get { return "ok"; } }
    public IEnumerable hotspots { get; set; }
    public int radius { get; set; }
    public int? nextPageKey { get; set; }
    public bool morePages { get; set; }
}

public class POI
{
    public object[] actions { get { return new object[] { }; } }
    public string attribution { get; set; }
    public float distance { get; set; }
    public int id { get; set; }
    public string imageUrl { get; set; }
    public int lat { get; set; }
    public int lon { get; set; }
    public string line2 { get; set; }
    public string line3 { get; set; }
    public string line4 { get; set; }
    public string title { get; set; }
    public int type { get; set; }
}

public static class Extensions
{
    public static int ToLayarCoord(this double coord)
    {
        return (int)(coord * 1000000);
    }
}

Dynamic Sitemap with ASP.Net MVC (incl. geo)

Here is how I generate sitemaps using the XDocument API and a ContentResult. The entries are events that come out of the EventRepository, please substitute as needed. Note that it would be vastly more elegant to use ActionLinks in some way. Note also that the first entry is a link to a Google Earth KMZ file (more here).

[OutputCache(Duration = 12 * 3600, VaryByParam = "*")]
public ContentResult Sitemap()
{
    string smdatetimeformat = "yyyy-MM-dd";

    var erep = new EventRepository();
    var events = (from e in erep.GetGeocodedEvents()
                    where e.IncidentTime.HasValue
                select new {e.Title, e.PermId, e.IncidentTime}).ToList();

    XNamespace sm = "http://www.sitemaps.org/schemas/sitemap/0.9";
    XNamespace geo = "http://www.google.com/geo/schemas/sitemap/1.0";
            
    XDocument doc = new XDocument(
        new XElement(sm + "urlset",
            new XAttribute("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9"),
            new XAttribute(XNamespace.Xmlns + "geo", 
                "http://www.google.com/geo/schemas/sitemap/1.0"),
            new XElement(sm + "url",
                new XElement(sm + "loc", "http://krimikort.ekstrabladet.dk/gearth.kmz"),
                new XElement(sm + "lastmod", DateTime.Now.ToString(smdatetimeformat)),
                new XElement(sm + "changefreq", "daily"),
                new XElement(sm + "priority", "1.0"),
                new XElement(geo + "geo",
                    new XElement(geo + "format", "kmz")
                )
            )
            ,
            events.Select(e => 
                new XElement(sm + "url",
                    new XElement(sm + "loc", EventExtensions.AbsUrl(e.Title, e.PermId)),
                    new XElement(sm + "lastmod", e.IncidentTime.Value.ToString(smdatetimeformat)),
                    new XElement(sm + "changefreq", "monthly"),
                    new XElement(sm + "priority", "0.5")
                )
            )
        )
    );

    return Content(doc.ToString(), "text/xml");
}

LinqtoCRM obsoleted

Shan McArthur put up a notice that the latest version (4.0.12) of the Microsoft CRM SDK includes Linq querying support. The CRM Team have a couple of blog posts describing the new features. I haven’t tested the new SDK, but I definitely recommend you try it out before using LinqtoCRM and I’ve put a notice to that effect on the LinqtoCRM front page.

It’s a little bit sad that LinqtoCRM probably won’t be used much anymore, but I also think it’s great that Microsoft is now providing what looks to be a solid Linq implementation for Dynamics CRM (especially considering the fact that we haven’t released new versions for more than a year).

Anyway, thanks to everyone who have contributed (esp. Mel Gerats and Petteri Räty) and to all the people who have used LinqtoCRM over the years! Now go get the new SDK and write some queries.

Linq-to-SQL, group-by, subqueries and performance

If you’re using Linq-to-SQL, doing group-by and selecting other columns than those in the grouping-key, performance might suffer. This is because there is no good translation of such queries to SQL and Linq-to-SQL has to resort to doing multiple subqueries. Matt Warren explains here. I experienced this firsthand when grouping a lot of geocoded events by latitude and longitude and selecting a few more columns (EventId and CategoryId in the example below):

from e in db.Events
group e by new { e.Lat, e.Lng } into g
select new
{
    g.Key.Lat,
    g.Key.Lng,
    es = g.Select(_ => new { _.EventId, _.CategoryId })
};

One possible solution is to fetch all events, to a ToList() and do the grouping in-memory.

var foo =
    from e in db.Events
    select new { e.Lat, e.Lng, e.EventId, e.CategoryId };

var bar = from e in foo.ToList()
            group e by new { e.Lat, e.Lng } into g
            select new
            {
                g.Key.Lat,
                g.Key.Lng,
                es = g.Select(_ => new { _.EventId, _.CategoryId })
            };