我有不同类型的数据集。 对于数据集的每一点,我想在每个类别中找到最接近点。 我可以做到这一点,但计算时间非常长,我挣扎要利用一个空间指数,同时以电子方式提供这些信息。
www.un.org/Depts/DGACM/index_spanish.htm 抽样数据生成
CREATE TYPE point_type AS ENUM ( 1 , 2 , 3 , 4 , 5 );
CREATE TABLE points AS
SELECT ST_MakePoint(
1000*random(),
1000*random()
)::geometry(Point) AS geom,
((random()*3)::int+1)::text::point_type point_type,
pk
FROM generate_series(1,6000) pk;
update points
set point_type= 5 where pk=999;
http://www.ohchr.org。
create index points_geom_idx
on points using gist (geom);
CREATE INDEX points_dual ON points (point_type, geom);
www.un.org/Depts/DGACM/index_spanish.htm 奏效但速度非常缓慢,但效果良好:
由于距离遥远,有线电视新闻网首先被拖走,然后被束缚后过滤?
explain analyse
with types as (
select column1::point_type point_type from (
values
( 1 ), ( 2 ), ( 3 ), ( 4 ),( 5 )
)
)
SELECT c1.point_type,
c1.pk AS main_id,
b.pk AS secondary_id,
c1.secondary_point_type,
b.secondary_point_type,
b.distance
FROM (SELECT c.point_type,
c.pk,
c.geom,
types.point_type secondary_point_type
FROM points c
join types on true
) c1
LEFT JOIN LATERAL ( SELECT c2.point_type,
c2.geom,
c2.pk,
c2.point_type secondary_point_type,
c1.geom <->c2.geom AS distance
FROM points c2
where c1.pk <> c2.pk and c1.secondary_point_type=c2.point_type
ORDER BY distance
LIMIT 1) b on true;
Query that is very fast but doesn t provide correct results I believe this is because it s just getting the closest point, and if that point isn t of the correct type, the join ultimately fails, so no data is joined, leaving nulls for most results
explain analyse
with types as (
select column1::point_type point_type from (
values
( 1 ), ( 2 ), ( 3 ), ( 4 ),( 5 )
)
)
SELECT c1.point_type,
c1.pk AS main_id,
b.pk AS secondary_id,
c1.secondary_point_type,
b.secondary_point_type,
b.distance
FROM (SELECT c.point_type,
c.pk,
c.geom,
types.point_type secondary_point_type
FROM points c
join types on true
) c1
LEFT JOIN LATERAL ( SELECT c2.point_type,
c2.geom,
c2.pk,
c2.point_type secondary_point_type,
c1.geom <->c2.geom AS distance
FROM points c2
where c1.pk <> c2.pk
ORDER BY distance
LIMIT 1) b on c1.secondary_point_type=b.secondary_point_type ;
I m trying to achieve this query quickly, using the spatial index for all knn measures across all types. Thanks!
outputs for analyze first query:
Sort (cost=29155.39..29230.39 rows=30000 width=28) (actual time=24533.167..24543.539 rows=30000 loops=1)
" Output: c.point_type, c.pk, c2.pk, ((""*VALUES*"".column1)::point_type), c2.point_type, ((c.geom <-> c2.geom))"
Sort Key: c2.point_type DESC
Sort Method: quicksort Memory: 2409kB
Buffers: shared hit=180999
-> Nested Loop Left Join (cost=0.15..26924.49 rows=30000 width=28) (actual time=5.024..24430.122 rows=30000 loops=1)
" Output: c.point_type, c.pk, c2.pk, (""*VALUES*"".column1)::point_type, c2.point_type, ((c.geom <-> c2.geom))"
Buffers: shared hit=180999
-> Nested Loop (cost=0.00..499.07 rows=30000 width=72) (actual time=0.546..105.076 rows=30000 loops=1)
" Output: c.point_type, c.pk, c.geom, ""*VALUES*"".column1"
Buffers: shared hit=64
-> Seq Scan on public.points c (cost=0.00..124.00 rows=6000 width=40) (actual time=0.341..12.850 rows=6000 loops=1)
Output: c.geom, c.point_type, c.pk
Buffers: shared hit=64
-> Materialize (cost=0.00..0.09 rows=5 width=32) (actual time=0.001..0.006 rows=5 loops=6000)
" Output: ""*VALUES*"".column1"
" -> Values Scan on ""*VALUES*"" (cost=0.00..0.06 rows=5 width=32) (actual time=0.034..0.141 rows=5 loops=1)"
" Output: ""*VALUES*"".column1"
-> Limit (cost=0.15..0.86 rows=1 width=52) (actual time=0.802..0.803 rows=1 loops=30000)
Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, ((c.geom <-> c2.geom))
Buffers: shared hit=180935
-> Result (cost=0.15..4249.52 rows=5999 width=52) (actual time=0.800..0.800 rows=1 loops=30000)
Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, (c.geom <-> c2.geom)
" One-Time Filter: ((""*VALUES*"".column1)::point_type = (""*VALUES*"".column1)::point_type)"
Buffers: shared hit=180935
-> Index Scan using points_geom_idx on public.points c2 (cost=0.15..500.15 rows=5999 width=40) (actual time=0.787..0.787 rows=1 loops=30000)
Output: c2.geom, c2.point_type, c2.pk
Order By: (c2.geom <-> c.geom)
Filter: (c.pk <> c2.pk)
Rows Removed by Filter: 1
Buffers: shared hit=180935
Settings: search_path = public, topology, tiger
Planning Time: 4.964 ms
Execution Time: 24553.107 ms
第二点:
QUERY PLAN
Nested Loop (cost=0.88..1197.38 rows=30000 width=28) (actual time=3.535..4538.832 rows=30000 loops=1)
" Output: c.point_type, c.pk, b.pk, (""*VALUES*"".column1)::point_type, b.secondary_point_type, b.distance"
Buffers: shared hit=36251
-> Seq Scan on public.points c (cost=0.00..124.00 rows=6000 width=40) (actual time=0.095..4.897 rows=6000 loops=1)
Output: c.geom, c.point_type, c.pk
Buffers: shared hit=64
-> Hash Left Join (cost=0.88..0.98 rows=5 width=48) (actual time=0.726..0.743 rows=5 loops=6000)
" Output: ""*VALUES*"".column1, b.pk, b.secondary_point_type, b.distance"
" Hash Cond: ((""*VALUES*"".column1)::point_type = b.secondary_point_type)"
Buffers: shared hit=36187
" -> Values Scan on ""*VALUES*"" (cost=0.00..0.06 rows=5 width=32) (actual time=0.001..0.008 rows=5 loops=6000)"
" Output: ""*VALUES*"".column1"
-> Hash (cost=0.87..0.87 rows=1 width=16) (actual time=0.707..0.707 rows=1 loops=6000)
Output: b.pk, b.secondary_point_type, b.distance
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=36187
-> Subquery Scan on b (cost=0.15..0.87 rows=1 width=16) (actual time=0.701..0.703 rows=1 loops=6000)
Output: b.pk, b.secondary_point_type, b.distance
Buffers: shared hit=36187
-> Limit (cost=0.15..0.86 rows=1 width=52) (actual time=0.700..0.700 rows=1 loops=6000)
Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, ((c.geom <-> c2.geom))
Buffers: shared hit=36187
-> Index Scan using points_geom_idx on public.points c2 (cost=0.15..4249.52 rows=5999 width=52) (actual time=0.695..0.695 rows=1 loops=6000)
Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, (c.geom <-> c2.geom)
Order By: (c2.geom <-> c.geom)
Filter: (c.pk <> c2.pk)
Rows Removed by Filter: 1
Buffers: shared hit=36187
Settings: search_path = public, topology, tiger
Planning Time: 3.206 ms
Execution Time: 4549.481 ms