Techniques for unique, correct and fast geo-queries II
You may remember my last post on this topic ended with a question. Just to recap, we have a table with a lot of rows that have geographical coordinates and we want to find a random subset that lies in a given map window. The problem with the query demonstrated last was that there are rows with the exact same coordinates and for those coordinates the query would always return the same row (namely the one with the highest intid).
I posted a question on Stackoverflow and Tom H. came up with a solution that was 90% there. The full query looks like this:
with cte as
(
select
intid,
row_number() over
(
partition by geoLat, geoLng
order by newid()
) as row_num,
count(intid) over (partition by geoLat, geoLng) as TotalCount
from
documents d
where
d.geoLat < @maxLat
and d.geoLat > @minLat
and
(
((@maxLng > @minLng) and
(d.geoLng < @maxLng and d.geoLng > @minLng))
or
((@maxLng < @minLng) and
((d.geoLng > @minLng and d.geoLng < 180)
or (d.geoLng > -180 and d.geoLng < @maxLng))
)
)
)
select top (@maxcount)
dd.*, cte.intid, rand(cte.intid)
from
cte,documents dd
where
row_num = 1 + floor(rand() * TotalCount) and cte.intid = dd.intid
order by newid()
The query uses Common Table Expressions, which I’ve dabbled in before. Looking at the execution plan makes my head hurt, but it’s at least as fast as the original version. See the new version in action at the TEDBot site.
Leave a Reply