Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up osmosis_highway_deadend sql20 #2097

Merged
merged 1 commit into from
Dec 22, 2023
Merged

Conversation

Famlam
Copy link
Collaborator

@Famlam Famlam commented Dec 9, 2023

The slowest part of this analyser is sql20, specifically the two LEFT JOINs. (Combining them into a single LEFT JOIN didn't improve much). sql20 comprises about 30-50% of the runtime of the osmosis_highway_deadend analyser for some extracts that I tested.

Current situation

(sql20 only, 2 runs)

  • japan_chubu: 2 min, 4 sec | 1 min 53 sec (EXPLAIN ANALYZE -> see below)
  • netherlands_gelderland: 1 min, 30 sec | 1 min, 33 sec
  • usa_iowa: 1 min, 15 sec | 1 min, 36 sec
  • slovenia: 48 sec | 55 sec
['Nested Loop Left Join  (cost=39497943.04..39498164.55 rows=1 width=80) (actual time=36332.465..110754.140 rows=367 loops=1)']
['  Filter: (bicycle_parking.id IS NULL)']
['  Rows Removed by Filter: 4']
['  ->  Nested Loop Left Join  (cost=39497013.35..39497225.94 rows=1 width=80) (actual time=35952.566..36226.878 rows=371 loops=1)']
['        Join Filter: ((ferry.linestring && nodes.geom) AND (nodes.id = ANY (ferry.nodes)))']
['        Rows Removed by Join Filter: 19663']
['        Filter: (ferry.id IS NULL)']
['        ->  GroupAggregate  (cost=39497012.94..39497068.31 rows=11 width=80) (actual time=31975.785..32213.364 rows=371 loops=1)']
['              Group Key: nodes.id']
['              Filter: (count(*) = 1)']
['              Rows Removed by Filter: 105044']
['              ->  Sort  (cost=39497012.94..39497018.48 rows=2215 width=58) (actual time=31975.722..32048.690 rows=522766 loops=1)']
['                    Sort Key: nodes.id']
['                    Sort Method: external merge  Disk: 32664kB']
['                    ->  Nested Loop  (cost=0.98..39496889.86 rows=2215 width=58) (actual time=1.281..31368.780 rows=522766 loops=1)']
['                          ->  Nested Loop  (cost=0.41..39490419.21 rows=2216 width=26) (actual time=0.960..25175.954 rows=522824 loops=1)']
['                                ->  Seq Scan on highway_ends way_ends  (cost=0.00..222209.65 rows=210945 width=183) (actual time=0.210..2802.315 rows=206932 loops=1)']
["                                      Filter: ((level < 3) OR (highway = 'cycleway'::text))"]
['                                      Rows Removed by Filter: 3681914']
['                                ->  Index Scan using idx_highways_linestring on highways  (cost=0.41..186.05 rows=10 width=227) (actual time=0.056..0.107 rows=3 loops=206932)']
['                                      Index Cond: (linestring && way_ends.linestring)']
['                                      Filter: ((NOT is_construction) AND (way_ends.nid = ANY (nodes)))']
['                                      Rows Removed by Filter: 21']
['                          ->  Index Scan using pk_nodes on nodes  (cost=0.56..2.92 rows=1 width=40) (actual time=0.011..0.011 rows=1 loops=522824)']
['                                Index Cond: (id = way_ends.nid)']
["                                Filter: (((NOT (tags ? 'amenity'::text)) OR ((tags -> 'amenity'::text) <> 'bicycle_parking'::text)) AND ((NOT (tags ? 'entrance'::text)) OR ((tags -> 'entrance'::text) = 'no'::text)) AND ((NOT (tags ? 'noexit'::text)) OR ((tags -> 'noexit'::text) = 'no'::text)))"]
['                                Rows Removed by Filter: 0']
['        ->  Materialize  (cost=0.41..157.21 rows=1 width=250) (actual time=0.185..10.726 rows=53 loops=371)']
['              ->  Index Scan using idx_ways_tags on ways ferry  (cost=0.41..157.21 rows=1 width=250) (actual time=68.337..3976.104 rows=53 loops=1)']
["                    Index Cond: (tags ? 'route'::text)"]
['                    Rows Removed by Index Recheck: 416646']
["                    Filter: ((tags -> 'route'::text) = 'ferry'::text)"]
['                    Rows Removed by Filter: 668']
['  ->  Bitmap Heap Scan on ways bicycle_parking  (cost=929.68..937.35 rows=1 width=250) (actual time=200.862..200.863 rows=0 loops=371)']
["        Recheck Cond: ((linestring && nodes.geom) AND (tags ? 'amenity'::text) AND (tags <> ''::hstore))"]
['        Rows Removed by Index Recheck: 1']
["        Filter: (((tags -> 'amenity'::text) = 'bicycle_parking'::text) AND (nodes.id = ANY (nodes)))"]
['        Rows Removed by Filter: 0']
['        Heap Blocks: exact=362']
['        ->  BitmapAnd  (cost=929.68..929.68 rows=5 width=0) (actual time=200.791..200.791 rows=0 loops=371)']
['              ->  Bitmap Index Scan on idx_ways_linestring  (cost=0.00..12.86 rows=660 width=0) (actual time=2.118..2.118 rows=12 loops=371)']
['                    Index Cond: (linestring && nodes.geom)']
['              ->  Bitmap Index Scan on idx_ways_tags  (cost=0.00..916.50 rows=50945 width=0) (actual time=198.663..198.663 rows=106194 loops=371)']
["                    Index Cond: (tags ? 'amenity'::text)"]
['Planning Time: 18.163 ms']
['Execution Time: 110755.747 ms']

Proposed solution and new situation

Checking for bicycle parkings as valid end points for car roads doesn't make much sense. Hence, only executing that check for cycleways results in the following

(sql20+21+22, 2 runs)

  • japan_chubu: 1 min, 1 sec | 1 min, 3 sec (EXPLAIN ANALYZE -> see below)
  • netherlands_gelderland: 1 min, 14 secs | 1 min, 19 sec
  • usa_iowa: 1 min, 7 sec | 1 min, 17 sec
  • slovenia: 32 sec | 35 sec
['Nested Loop Left Join  (cost=39497013.35..39497225.94 rows=1 width=80) (actual time=33804.278..34118.326 rows=371 loops=1)']
['  Join Filter: ((ferry.linestring && nodes.geom) AND (nodes.id = ANY (ferry.nodes)))']
['  Rows Removed by Join Filter: 19663']
['  Filter: (ferry.id IS NULL)']
['  ->  GroupAggregate  (cost=39497012.94..39497068.31 rows=11 width=80) (actual time=30556.127..30833.714 rows=371 loops=1)']
['        Group Key: nodes.id']
['        Filter: (count(*) = 1)']
['        Rows Removed by Filter: 105044']
['        ->  Sort  (cost=39497012.94..39497018.48 rows=2215 width=58) (actual time=30556.087..30645.711 rows=522766 loops=1)']
['              Sort Key: nodes.id']
['              Sort Method: external merge  Disk: 32664kB']
['              ->  Nested Loop  (cost=0.98..39496889.86 rows=2215 width=58) (actual time=0.501..29862.040 rows=522766 loops=1)']
['                    ->  Nested Loop  (cost=0.41..39490419.21 rows=2216 width=26) (actual time=0.483..25665.136 rows=522824 loops=1)']
['                          ->  Seq Scan on highway_ends way_ends  (cost=0.00..222209.65 rows=210945 width=183) (actual time=0.235..3344.797 rows=206932 loops=1)']
["                                Filter: ((level < 3) OR (highway = 'cycleway'::text))"]
['                                Rows Removed by Filter: 3681914']
['                          ->  Index Scan using idx_highways_linestring on highways  (cost=0.41..186.05 rows=10 width=227) (actual time=0.058..0.106 rows=3 loops=206932)']
['                                Index Cond: (linestring && way_ends.linestring)']
['                                Filter: ((NOT is_construction) AND (way_ends.nid = ANY (nodes)))']
['                                Rows Removed by Filter: 21']
['                    ->  Index Scan using pk_nodes on nodes  (cost=0.56..2.92 rows=1 width=40) (actual time=0.007..0.007 rows=1 loops=522824)']
['                          Index Cond: (id = way_ends.nid)']
["                          Filter: (((NOT (tags ? 'amenity'::text)) OR ((tags -> 'amenity'::text) <> 'bicycle_parking'::text)) AND ((NOT (tags ? 'entrance'::text)) OR ((tags -> 'entrance'::text) = 'no'::text)) AND ((NOT (tags ? 'noexit'::text)) OR ((tags -> 'noexit'::text) = 'no'::text)))"]
['                          Rows Removed by Filter: 0']
['  ->  Materialize  (cost=0.41..157.21 rows=1 width=250) (actual time=0.119..8.760 rows=53 loops=371)']
['        ->  Index Scan using idx_ways_tags on ways ferry  (cost=0.41..157.21 rows=1 width=250) (actual time=43.971..3247.569 rows=53 loops=1)']
["              Index Cond: (tags ? 'route'::text)"]
['              Rows Removed by Index Recheck: 416646']
["              Filter: ((tags -> 'route'::text) = 'ferry'::text)"]
['              Rows Removed by Filter: 668']
['Planning Time: 21.394 ms']
['Execution Time: 34119.448 ms']



['Nested Loop Left Join  (cost=929.68..3770.05 rows=1 width=48) (actual time=238.006..31064.207 rows=136 loops=1)']
['  Filter: (bicycle_parking.id IS NULL)']
['  Rows Removed by Filter: 4']
['  ->  Seq Scan on unconnected_highways  (cost=0.00..19.38 rows=4 width=48) (actual time=0.012..0.529 rows=140 loops=1)']
["        Filter: (highway = 'cycleway'::text)"]
['        Rows Removed by Filter: 231']
['  ->  Bitmap Heap Scan on ways bicycle_parking  (cost=929.68..937.35 rows=1 width=250) (actual time=221.865..221.865 rows=0 loops=140)']
["        Recheck Cond: ((linestring && unconnected_highways.geom) AND (tags ? 'amenity'::text) AND (tags <> ''::hstore))"]
['        Rows Removed by Index Recheck: 1']
["        Filter: (((tags -> 'amenity'::text) = 'bicycle_parking'::text) AND (unconnected_highways.nid = ANY (nodes)))"]
['        Rows Removed by Filter: 0']
['        Heap Blocks: exact=150']
['        ->  BitmapAnd  (cost=929.68..929.68 rows=5 width=0) (actual time=221.786..221.786 rows=0 loops=140)']
['              ->  Bitmap Index Scan on idx_ways_linestring  (cost=0.00..12.86 rows=660 width=0) (actual time=2.860..2.860 rows=13 loops=140)']
['                    Index Cond: (linestring && unconnected_highways.geom)']
['              ->  Bitmap Index Scan on idx_ways_tags  (cost=0.00..916.50 rows=50945 width=0) (actual time=218.914..218.914 rows=106194 loops=140)']
["                    Index Cond: (tags ? 'amenity'::text)"]
['Planning Time: 0.481 ms']
['Execution Time: 31064.427 ms']



['Seq Scan on unconnected_highways  (cost=0.00..951.88 rows=746 width=48) (actual time=0.020..0.264 rows=231 loops=1)']
["  Filter: (highway <> 'cycleway'::text)"]
['  Rows Removed by Filter: 140']
['Planning Time: 0.067 ms']
['Execution Time: 0.307 ms']

Some extracts are not really affected, others improve by nearly a factor 2 (for this part of the analyser only). I suspect it depends on how many bicycle parkings and/or cycleways and/or tertiary-and-above-roads there are in the country (and how much they're split up into small or long way fragments).

@frodrigo
Copy link
Member

frodrigo commented Dec 9, 2023

The inner JOIN nodes could be moved outer the LEFT JOIN. Probably only a small improvement.

As it use a Materialize using temps table with index make sens for ferry and parking.

@Famlam
Copy link
Collaborator Author

Famlam commented Dec 9, 2023

The inner JOIN nodes could be moved outer the LEFT JOIN. Probably only a small improvement.

I'm not sure if I understand what you mean, but this would probably cause the COUNT(*)=1 to fail in case you have a connection between an end node of a highway and a node somewhere in the middle of a highway (like in a "T")

As it use a Materialize using temps table with index make sens for ferry and parking.

I'll give it a try in a couple of days.

Checking for bicycle parking ways is a very slow part of the sql, but this is only relevant for cycleways. Skip it for car roads.

Use indices for ferry and bicycle parkings
@Famlam
Copy link
Collaborator Author

Famlam commented Dec 9, 2023

I'll give it a try in a couple of days.

Ok, couple of hours actually :)

With ferry and bicycle_parking in their own TEMP TABLE, and indices on their linestring
japan_chubu: 58 sec | 31 sec
netherlands_gelderland: 47 sec | 12 sec
usa_iowa: 25 sec | 13 sec
slovenia: 22 sec | 6 sec

No clue why the second (and all subsequent) run is so much faster this time, possibly osm110 was busy the first time. But all of the first runs seem an improvement already.


japan_chubu

['Index Scan using idx_ways_tags on ways  (cost=0.41..157.21 rows=1 width=242) (actual time=48.135..2894.786 rows=53 loops=1)']
["  Index Cond: (tags ? 'route'::text)"]
['  Rows Removed by Index Recheck: 416646']
["  Filter: ((tags -> 'route'::text) = 'ferry'::text)"]
['  Rows Removed by Filter: 668']
['Planning Time: 2.387 ms']
['Execution Time: 2894.888 ms']

['Bitmap Heap Scan on ways  (cost=916.57..63023.39 rows=255 width=242) (actual time=283.111..1594.082 rows=1640 loops=1)']
["  Recheck Cond: ((tags ? 'amenity'::text) AND (tags <> ''::hstore))"]
['  Rows Removed by Index Recheck: 19762']
["  Filter: ((tags -> 'amenity'::text) = 'bicycle_parking'::text)"]
['  Rows Removed by Filter: 84792']
['  Heap Blocks: exact=46209']
['  ->  Bitmap Index Scan on idx_ways_tags  (cost=0.00..916.50 rows=50945 width=0) (actual time=273.198..273.198 rows=106194 loops=1)']
["        Index Cond: (tags ? 'amenity'::text)"]
['Planning Time: 0.212 ms']
['Execution Time: 1594.798 ms']

['Nested Loop Left Join  (cost=39497013.07..39497082.35 rows=1 width=80) (actual time=39986.298..40222.355 rows=371 loops=1)']
['  Filter: (ferry.* IS NULL)']
['  ->  GroupAggregate  (cost=39497012.94..39497068.31 rows=11 width=80) (actual time=39986.236..40206.376 rows=371 loops=1)']
['        Group Key: nodes.id']
['        Filter: (count(*) = 1)']
['        Rows Removed by Filter: 105044']
['        ->  Sort  (cost=39497012.94..39497018.48 rows=2215 width=58) (actual time=39986.197..40052.745 rows=522766 loops=1)']
['              Sort Key: nodes.id']
['              Sort Method: external merge  Disk: 32664kB']
['              ->  Nested Loop  (cost=0.98..39496889.86 rows=2215 width=58) (actual time=1.237..39281.533 rows=522766 loops=1)']
['                    ->  Nested Loop  (cost=0.41..39490419.21 rows=2216 width=26) (actual time=0.927..26424.495 rows=522824 loops=1)']
['                          ->  Seq Scan on highway_ends way_ends  (cost=0.00..222209.65 rows=210945 width=183) (actual time=0.208..2820.594 rows=206932 loops=1)']
["                                Filter: ((level < 3) OR (highway = 'cycleway'::text))"]
['                                Rows Removed by Filter: 3681914']
['                          ->  Index Scan using idx_highways_linestring on highways  (cost=0.41..186.05 rows=10 width=227) (actual time=0.058..0.113 rows=3 loops=206932)']
['                                Index Cond: (linestring && way_ends.linestring)']
['                                Filter: ((NOT is_construction) AND (way_ends.nid = ANY (nodes)))']
['                                Rows Removed by Filter: 21']
['                    ->  Index Scan using pk_nodes on nodes  (cost=0.56..2.92 rows=1 width=40) (actual time=0.024..0.024 rows=1 loops=522824)']
['                          Index Cond: (id = way_ends.nid)']
["                          Filter: (((NOT (tags ? 'amenity'::text)) OR ((tags -> 'amenity'::text) <> 'bicycle_parking'::text)) AND ((NOT (tags ? 'entrance'::text)) OR ((tags -> 'entrance'::text) = 'no'::text)) AND ((NOT (tags ? 'noexit'::text)) OR ((tags -> 'noexit'::text) = 'no'::text)))"]
['                          Rows Removed by Filter: 0']
['  ->  Index Scan using idx_ferry_linestring on ferry  (cost=0.14..1.26 rows=1 width=152) (actual time=0.042..0.042 rows=0 loops=371)']
['        Index Cond: (linestring && nodes.geom)']
['        Filter: (nodes.id = ANY (nodes))']
['        Rows Removed by Filter: 3']
['Planning Time: 12.983 ms']
['Execution Time: 40224.434 ms']

['Nested Loop Left Join  (cost=0.14..33.37 rows=1 width=48) (actual time=0.496..2.694 rows=136 loops=1)']
['  Filter: (bicycle_parking.* IS NULL)']
['  Rows Removed by Filter: 4']
['  ->  Seq Scan on unconnected_highways  (cost=0.00..19.38 rows=4 width=48) (actual time=0.013..0.073 rows=140 loops=1)']
["        Filter: (highway = 'cycleway'::text)"]
['        Rows Removed by Filter: 231']
['  ->  Index Scan using idx_bicycle_parking_linestring on bicycle_parking  (cost=0.14..3.18 rows=1 width=152) (actual time=0.014..0.014 rows=0 loops=140)']
['        Index Cond: (linestring && unconnected_highways.geom)']
['        Filter: (unconnected_highways.nid = ANY (nodes))']
['Planning Time: 0.290 ms']
['Execution Time: 2.941 ms']

['Seq Scan on unconnected_highways  (cost=0.00..951.88 rows=746 width=48) (actual time=0.008..0.145 rows=231 loops=1)']
["  Filter: (highway <> 'cycleway'::text)"]
['  Rows Removed by Filter: 140']
['Planning Time: 0.018 ms']
['Execution Time: 0.164 ms']

@frodrigo frodrigo merged commit a633f7a into osm-fr:dev Dec 22, 2023
3 checks passed
@frodrigo
Copy link
Member

Thank you.

@Famlam Famlam deleted the speed-up-deadend branch December 22, 2023 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants