Question

我有不同类型的数据集。对于数据集的每一点,我想在每个类别中找到最接近点。我可以做到这一点,但计算时间非常长,我挣扎要利用一个空间指数,同时以电子方式提供这些信息。

www.un.org/Depts/DGACM/index_spanish.htm 抽样数据生成

CREATE TYPE point_type AS ENUM ( 1 , 2 , 3 , 4 , 5 );

CREATE TABLE points AS
  SELECT ST_MakePoint(
    1000*random(),
    1000*random()
    )::geometry(Point) AS geom,
     ((random()*3)::int+1)::text::point_type  point_type,
         pk
  FROM generate_series(1,6000) pk;
update points
set point_type= 5  where pk=999;

http://www.ohchr.org。

create index points_geom_idx
    on points using gist (geom);

CREATE INDEX points_dual ON points (point_type, geom);

www.un.org/Depts/DGACM/index_spanish.htm 奏效但速度非常缓慢,但效果良好:

由于距离遥远,有线电视新闻网首先被拖走,然后被束缚后过滤?

explain analyse
with types as (
select column1::point_type point_type from (
values
( 1 ), ( 2 ), ( 3 ), ( 4 ),( 5 )
       )
)
SELECT c1.point_type,
       c1.pk AS main_id,
       b.pk  AS secondary_id,
       c1.secondary_point_type,
       b.secondary_point_type,
       b.distance
FROM (SELECT c.point_type,
             c.pk,
             c.geom,
             types.point_type secondary_point_type
      FROM  points c
          join types on true
          ) c1

         LEFT JOIN LATERAL ( SELECT c2.point_type,
                                    c2.geom,
                                    c2.pk,
                                    c2.point_type secondary_point_type,
                                    c1.geom <->c2.geom AS distance
                             FROM points c2

         where c1.pk <> c2.pk          and c1.secondary_point_type=c2.point_type

                             ORDER BY distance
                             LIMIT 1)  b on true;

Query that is very fast but doesn t provide correct results I believe this is because it s just getting the closest point, and if that point isn t of the correct type, the join ultimately fails, so no data is joined, leaving nulls for most results

explain analyse
with types as (
select column1::point_type point_type from (
values
( 1 ), ( 2 ), ( 3 ), ( 4 ),( 5 )
       )
)
SELECT c1.point_type,
       c1.pk AS main_id,
       b.pk  AS secondary_id,
       c1.secondary_point_type,
       b.secondary_point_type,
       b.distance
FROM (SELECT c.point_type,
             c.pk,
             c.geom,
             types.point_type secondary_point_type
      FROM  points c
          join types on true
          ) c1

         LEFT JOIN LATERAL ( SELECT c2.point_type,
                                    c2.geom,
                                    c2.pk,
                                    c2.point_type secondary_point_type,
                                    c1.geom <->c2.geom AS distance
                             FROM points c2

         where c1.pk <> c2.pk
                             ORDER BY distance
                             LIMIT 1)  b on c1.secondary_point_type=b.secondary_point_type ;

I m trying to achieve this query quickly, using the spatial index for all knn measures across all types. Thanks!

outputs for analyze first query:

Sort  (cost=29155.39..29230.39 rows=30000 width=28) (actual time=24533.167..24543.539 rows=30000 loops=1)
"  Output: c.point_type, c.pk, c2.pk, ((""*VALUES*"".column1)::point_type), c2.point_type, ((c.geom <-> c2.geom))"
  Sort Key: c2.point_type DESC
  Sort Method: quicksort  Memory: 2409kB
  Buffers: shared hit=180999
  ->  Nested Loop Left Join  (cost=0.15..26924.49 rows=30000 width=28) (actual time=5.024..24430.122 rows=30000 loops=1)
"        Output: c.point_type, c.pk, c2.pk, (""*VALUES*"".column1)::point_type, c2.point_type, ((c.geom <-> c2.geom))"
        Buffers: shared hit=180999
        ->  Nested Loop  (cost=0.00..499.07 rows=30000 width=72) (actual time=0.546..105.076 rows=30000 loops=1)
"              Output: c.point_type, c.pk, c.geom, ""*VALUES*"".column1"
              Buffers: shared hit=64
              ->  Seq Scan on public.points c  (cost=0.00..124.00 rows=6000 width=40) (actual time=0.341..12.850 rows=6000 loops=1)
                    Output: c.geom, c.point_type, c.pk
                    Buffers: shared hit=64
              ->  Materialize  (cost=0.00..0.09 rows=5 width=32) (actual time=0.001..0.006 rows=5 loops=6000)
"                    Output: ""*VALUES*"".column1"
"                    ->  Values Scan on ""*VALUES*""  (cost=0.00..0.06 rows=5 width=32) (actual time=0.034..0.141 rows=5 loops=1)"
"                          Output: ""*VALUES*"".column1"
        ->  Limit  (cost=0.15..0.86 rows=1 width=52) (actual time=0.802..0.803 rows=1 loops=30000)
              Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, ((c.geom <-> c2.geom))
              Buffers: shared hit=180935
              ->  Result  (cost=0.15..4249.52 rows=5999 width=52) (actual time=0.800..0.800 rows=1 loops=30000)
                    Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, (c.geom <-> c2.geom)
"                    One-Time Filter: ((""*VALUES*"".column1)::point_type = (""*VALUES*"".column1)::point_type)"
                    Buffers: shared hit=180935
                    ->  Index Scan using points_geom_idx on public.points c2  (cost=0.15..500.15 rows=5999 width=40) (actual time=0.787..0.787 rows=1 loops=30000)
                          Output: c2.geom, c2.point_type, c2.pk
                          Order By: (c2.geom <-> c.geom)
                          Filter: (c.pk <> c2.pk)
                          Rows Removed by Filter: 1
                          Buffers: shared hit=180935
Settings: search_path =  public, topology, tiger 
Planning Time: 4.964 ms
Execution Time: 24553.107 ms

第二点:

QUERY PLAN
Nested Loop  (cost=0.88..1197.38 rows=30000 width=28) (actual time=3.535..4538.832 rows=30000 loops=1)
"  Output: c.point_type, c.pk, b.pk, (""*VALUES*"".column1)::point_type, b.secondary_point_type, b.distance"
  Buffers: shared hit=36251
  ->  Seq Scan on public.points c  (cost=0.00..124.00 rows=6000 width=40) (actual time=0.095..4.897 rows=6000 loops=1)
        Output: c.geom, c.point_type, c.pk
        Buffers: shared hit=64
  ->  Hash Left Join  (cost=0.88..0.98 rows=5 width=48) (actual time=0.726..0.743 rows=5 loops=6000)
"        Output: ""*VALUES*"".column1, b.pk, b.secondary_point_type, b.distance"
"        Hash Cond: ((""*VALUES*"".column1)::point_type = b.secondary_point_type)"
        Buffers: shared hit=36187
"        ->  Values Scan on ""*VALUES*""  (cost=0.00..0.06 rows=5 width=32) (actual time=0.001..0.008 rows=5 loops=6000)"
"              Output: ""*VALUES*"".column1"
        ->  Hash  (cost=0.87..0.87 rows=1 width=16) (actual time=0.707..0.707 rows=1 loops=6000)
              Output: b.pk, b.secondary_point_type, b.distance
              Buckets: 1024  Batches: 1  Memory Usage: 9kB
              Buffers: shared hit=36187
              ->  Subquery Scan on b  (cost=0.15..0.87 rows=1 width=16) (actual time=0.701..0.703 rows=1 loops=6000)
                    Output: b.pk, b.secondary_point_type, b.distance
                    Buffers: shared hit=36187
                    ->  Limit  (cost=0.15..0.86 rows=1 width=52) (actual time=0.700..0.700 rows=1 loops=6000)
                          Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, ((c.geom <-> c2.geom))
                          Buffers: shared hit=36187
                          ->  Index Scan using points_geom_idx on public.points c2  (cost=0.15..4249.52 rows=5999 width=52) (actual time=0.695..0.695 rows=1 loops=6000)
                                Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, (c.geom <-> c2.geom)
                                Order By: (c2.geom <-> c.geom)
                                Filter: (c.pk <> c2.pk)
                                Rows Removed by Filter: 1
                                Buffers: shared hit=36187
Settings: search_path =  public, topology, tiger 
Planning Time: 3.206 ms
Execution Time: 4549.481 ms

Answer 1

你的第二位指数需要像你的第一个指数一样。为此,你们需要推广树苗。

CREATE EXTENSION btree_gist;
CREATE INDEX points_dual ON points using gist (point_type, geom);

友情链接