Techniques for unique, correct and fast geo-queries II

You may remember my last post on this topic ended with a question. Just to recap, we have a table with a lot of rows that have geographical coordinates and we want to find a random subset that lies in a given map window. The problem with the query demonstrated last was that there are rows with the exact same coordinates and for those coordinates the query would always return the same row (namely the one with the highest intid).

I posted a question on Stackoverflow and Tom H. came up with a solution that was 90% there. The full query looks like this:

with cte as
        row_number() over
                    partition by geoLat, geoLng
                    order by newid()
            ) as row_num,
        count(intid) over (partition by geoLat, geoLng) as TotalCount
        documents d
		d.geoLat < @maxLat
			and d.geoLat > @minLat
				((@maxLng > @minLng) and
					(d.geoLng < @maxLng and d.geoLng > @minLng))
				((@maxLng < @minLng) and
					((d.geoLng > @minLng and d.geoLng < 180)
						or (d.geoLng > -180 and d.geoLng < @maxLng))
select top (@maxcount)
    dd.*, cte.intid, rand(cte.intid)
    cte,documents dd
    row_num = 1 + floor(rand() * TotalCount) and cte.intid = dd.intid
order by newid()

The query uses Common Table Expressions, which I’ve dabbled in before. Looking at the execution plan makes my head hurt, but it’s at least as fast as the original version. See the new version in action at the TEDBot site.


Leave a Reply

Your email address will not be published. Required fields are marked *