Techniques for unique, correct and fast geo-queries II

Posted on |

You may remember my last post on this topic ended with a question. Just to recap, we have a table with a lot of rows that have geographical coordinates and we want to find a random subset that lies in a given map window. The problem with the query demonstrated last was that there are rows with the exact same coordinates and for those coordinates the query would always return the same row (namely the one with the highest intid).

I posted a question on Stackoverflow and Tom H. came up with a solution that was 90% there. The full query looks like this:

with cte as
(
    select
        intid,
        row_number() over
            (
                    partition by geoLat, geoLng
                    order by newid()
            ) as row_num,
        count(intid) over (partition by geoLat, geoLng) as TotalCount
    from
        documents d
	where
		d.geoLat < @maxLat
			and d.geoLat > @minLat
			and
			(
				((@maxLng > @minLng) and
					(d.geoLng < @maxLng and d.geoLng > @minLng))
				or
				((@maxLng < @minLng) and
					((d.geoLng > @minLng and d.geoLng < 180)
						or (d.geoLng > -180 and d.geoLng < @maxLng))
					)
			)
)
select top (@maxcount)
    dd.*, cte.intid, rand(cte.intid)
from
    cte,documents dd
where
    row_num = 1 + floor(rand() * TotalCount) and cte.intid = dd.intid
order by newid()

The query uses Common Table Expressions, which I’ve dabbled in before. Looking at the execution plan makes my head hurt, but it’s at least as fast as the original version. See the new version in action at the TEDBot site.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *