Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: output format geoparquet #95

Open
jsignell opened this issue Jul 17, 2023 · 9 comments
Open

ENH: output format geoparquet #95

jsignell opened this issue Jul 17, 2023 · 9 comments
Assignees

Comments

@jsignell
Copy link

This came up in the STAC meeting today. Currently it looks like the supported output formats (available with the f query param) are: 'geojson', 'html', 'json', 'csv', 'geojsonseq', 'ndjson'. I got that list by naively trying https://firenrt.delta-backend.com/collections/public.eis_fire_lf_perimeter_nrt/items?f=geoparquet

It would be neat to add 'geoparquet' as an option.

Not sure if this is the right place to capture the request so feel free to close, just wanted to increase the visibility of that conversation.

@vincentsarago
Copy link
Member

thanks for starting the discussion @jsignell 🙏

This is definitely something we could/should support.

@kylebarron
Copy link
Member

Which endpoints should have GeoParquet? items here?

tipg/tipg/factory.py

Lines 1065 to 1067 in be04e60

output_type: Annotated[
Optional[MediaType], Depends(ItemsOutputType)
] = None,

It might be easiest to use the existing https://github.com/stac-utils/stac-geoparquet library for this

@jsignell
Copy link
Author

yeah exactly! I think /items under just a different f query param. stac-geoparquet will definitely be helpful, but might need some alterations since these aren't STAC objects exactly.

@vincentsarago
Copy link
Member

@kylebarron yes, I think it makes senses to enable GeoParquet output for Items first. (we might want collections later but it will be less useful).

It might be easiest to use the existing https://github.com/stac-utils/stac-geoparquet library for this

stac-geoparquet, depends on pandas and geopandas (thus shapely), this would be quite heavy dependencies just to add an output format. I was hopping for a more lightweight solution 🙏

@kylebarron
Copy link
Member

Unfortunately GeoParquet currently requires rather heavy dependencies to read and write from Python.

For one, the primary way to read and write Parquet in Python is via pyarrow, and that's an 80MB wheel on top of Numpy:

pip install pyarrow -t pyarrow_tmp
du -csh pyarrow_tmp/*
 12K	pyarrow_tmp/bin
 60M	pyarrow_tmp/numpy
220K	pyarrow_tmp/numpy-1.25.2.dist-info
 85M	pyarrow_tmp/pyarrow
204K	pyarrow_tmp/pyarrow-12.0.1.dist-info
146M	total

Additionally, the GeoParquet spec says to store geometries in WKB, so you need some way to convert your existing geometries into WKB, and Shapely seems like the easiest to reach for.

People have been discussing making pyarrow more modular so that the bundle size is smaller, but nothing has happened yet. When my Rust geoarrow library and its Python bindings are more stable (not imminently) it might be a good choice for stuff like this that intends to be able to be deployed on lambda.

@vincentsarago
Copy link
Member

🤯 I don't think this feature is extremely needed right now so we can wait especially if this can help for your library to be ready :-)

FYI: we already have pyproj dependency (via morecantile)

Note: we could still add an heavy dependency and make the whole thing optional if this is really something user/customers want

@kylebarron
Copy link
Member

Definitely agree with making it an optional dependency if we add it.

@bitner
Copy link
Contributor

bitner commented Oct 30, 2023

Yes, definitely would want to add this as optional dependency. We would probably want to implement this starting from a query like SELECT column_a, column_b, ST_ASWKB(geometry_column) FROM mytable rather than going through the geojson that we create and build up the geoparquet from those results which would eliminate the need for shapely or the like.

@kylebarron
Copy link
Member

I'm not too familiar with the tipg internals but happy to help implement this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants