Replies: 6 comments 18 replies
-
Note that I've run the first script on 2/3's of the data and am trying to upload to source.coop, but my internet is slow. And hoping to get all the data in one duckdb and do a partition to get on source. But I'm on vacation the next couple weeks, and much of it with slow internet, so doubt I'll get everything up. If anyone else has the ability to run those and upload to the cloud go for it. |
Beta Was this translation helpful? Give feedback.
-
Here's how I prepare a CSV file of places for input into tippecanoe from an on-disk copy of the dataset: COPY (select json_extract(bbox,'$.minx') as x,
json_extract(bbox,'$.miny') as y,
json_extract_string(names, '$.common[0].value') as name,
json_extract_string(categories, '$.main') as category_main,
from read_parquet('overture/theme=places/type=place/*')) TO 'pois.csv' (HEADER, DELIMITER ',');
Update: Demo tiled output at https://github.com/bdon/overture-tiles |
Beta Was this translation helpful? Give feedback.
-
Super cool to see the interest around DuckDB spatial that this has created! I'm well aware that some things are still a little rough around the edges, e.g. the lack of spatial indexes, slow spatial joins and lack of predicate pushdown into structs. While I really wish I could bring these features to you sooner, I'll just do a shameless plug and mention that we've not been able to secure any funding for development on the spatial extension in particular (either through client projects or support contracts) and I therefore have to spend more of my energy on other projects. That said we're really grateful to just receive feedback and bug reports, and I'll make a note of any limitations people end up running into or use cases to support in the future. |
Beta Was this translation helpful? Give feedback.
-
I used the example from the instructions for DuckDB and tried to remake it to match the attributes of buildings. But the code doesn't work. What am I doing wrong? db = duckdb.connect()
db.execute("INSTALL spatial")
db.execute("INSTALL httpfs")
db.execute("""
LOAD spatial;
LOAD httpfs;
SET s3_region='eu-central-1';
""")
db.execute("""
COPY (
SELECT
type,
names,
height,
numFloors,
class,
JSON(names) AS names,
JSON(sources) AS sources,
ST_GeomFromWkb(geometry) AS geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2023-07-26-alpha.0/theme=buildings/type=*/*', filename=true, hive_partitioning=1)
WHERE ST_GeometryType(ST_GeomFromWkb(geometry)) IN ('POLYGON','MULTIPOLYGON')
limit 100
) TO 'buildings.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON');
""")
Can you show an example of DuckDB code for issuing geojson buildings for a specific city?
I'm new to this, please forgive me if the question seems stupid |
Beta Was this translation helpful? Give feedback.
-
I did a blog post for the Esri community board I run. I really should have read the data into a dataframe and let that handle the field schema, next time maybe, but anyway a good exercise. |
Beta Was this translation helpful? Give feedback.
-
Here is a Notebook that extracts transportation/segments features from an arbitrary division_area in the divisions theme, which I think will be a common pattern. I'll get to unnesting the road column next, which is where a lot of value is. |
Beta Was this translation helpful? Give feedback.
-
Just wanted to start a thread for people to share any SQL or scripts they're doing with DuckDB with Overture dataset. I picked the buildings category, since that's where I'm working, but consider this open for all. (would be nice to have more discussion categories, like one for 'tools' or something).
I mostly wanted to share my scripts and running notes on queries
Notes: (the repo name is aspirational - I hope to make tutorials, but mostly used this as scratch to remember stuff when my DuckDB history gets lost.)
Scripts:
My goal with this is to get to valid geoparquet, partitioned on source.coop, building on experiments at https://beta.source.coop/cholmes/google-open-buildings My hope is that it is then much faster to access by geography and get everything, leveraging duckdb & parquet magic. And then to wrap that in a script that lets people input their geometry and desired output format, and it gets it all for them. Ideally the core data would be distributed like this, but I'll aim to prove it out first.
Please everyone share what you're doing with DuckDB - big and small! It can just be nice SQL queries that others might be interested in.
Beta Was this translation helpful? Give feedback.
All reactions