English 中文(简体)
邻近搜索
原标题:
  • 时间:2008-11-03 23:33:14
  •  标签:

How does an application perform a proximity search? For example, a user types in a postal code, then the application lists all the businesses within 20 miles ordered by proximity.

I want to build something like that in PHP and MySQL. Is this approach correct?

  1. Get the addresses for locations I m interested in and store in my database
  2. Geocode all the addresses with Google s geocoding service
  3. Write a database query that includes Haversine formula to do the proximity search and ordering

Is this OK? In step 3, I m going to calculate the proximity for every query. Is it better to have a PROXIMITY table that lists the distance between every business and a few reference locations?

最佳回答

如果有足够的速度记录,则可以提前索引它们的方法。

Define a grid of bins about 20 miles on a side. Store the bin number with each store s record. At search time, compute the numbers of all bins that intersect a 20-mile radius from your search point. Then retrieve all stores in any of those bins, and proceed as before.

问题回答

我们使用这个来处理数千个点。如果您在SQL中执行此操作,对纬度和经度列进行索引非常重要。我们尝试使用SQL 2008和空间索引进行此操作,但实际上我们并没有看到我们期望的性能增长。虽然如果您想计算距离某个邮编一定距离内的点,您需要考虑是否使用邮编质心或邮政区域的多边形表示。

Haversine forumla is a good place to start.

我们在计算距离时没有性能问题,但对于一些我们事先知道点并且会有成千上万条记录的应用程序,我们会预先计算距离。

SELECT
        [DistanceRadius]=
        69.09 *
        DEGREES(
          ACOS(
            SIN( RADIANS(latitude) )*SIN( RADIANS(@ziplat) ) 
           +
            COS( RADIANS(latitude) )*COS( RADIANS(@ziplat) ) 
           *
            COS( RADIANS(longitude - (@ziplon)) )
          )
        )
        ,*
        FROM
            table

    ) sub
WHERE
    sub.DistanceRadius < @radius

We do this for about 1200 locations. I would just use the Haversine formula on the fly although depending on you application, it might be better to store it in PHP instead of SQL. (Our implementation is in .net so your milage may vary).

Really our biggest drawback with the way we implemented it, is that every calculation (up until recently) had to be calculated on the data tier which was painfully slow (when I say slow, I really mean non-instantaneous it took a second or so), but that was due to the fact that it had to calculate the distance for all 1200 locations based on the supplied zip code.

Depending on the route you choose, there are ways of speeding up the number distance calculations, by looking at the longitude and latitude and removing the ones outside of a predefined range (for example if you are looking at all address within 20 miles there is a longitude range you can calculate which all addresses have to fall in to be 20 miles away.) That can speed up you query if need be.

We actually looked at storing all possible combinations in our database. In reality it sounds like it could be a large data store, but it s really not in the big scope of things. With indexes it can be quite fast, and you don t have to worry about algorithm optimization etc. We decided against it, because we had the equation in C#, and it allowed us to cache the information necessary to do all the calculations in the business tier. Either will work just fine, it s just a matter of what your preference is.





相关问题
热门标签