geopandas: Mapping int/int64 to int32 for OGR_GMT format #2592

seisman · 2023-07-04T02:16:39Z

Description of proposed changes

Reminders

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst.
Write detailed docstrings for all functions/methods.
If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
If adding new functionality, add an example to docstrings or tutorials.
Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

/format: automatically format and lint the code
/test-gmt-dev: run full tests on the latest GMT development version

weiji14

Need to add a unit test to pygmt/tests/test_geopandas.py that uses int/int64 columns.

weiji14 · 2023-07-04T02:39:29Z

pygmt/helpers/tempfile.py

+            # Workaround to map int/int64 to int32
+            # https://github.com/geopandas/geopandas/issues/967#issuecomment-842877704
+            # https://github.com/GenericMappingTools/pygmt/issues/2497
+            schema = gpd.io.file.infer_schema(geojson)
+            for col, dtype in schema["properties"].items():
+                if dtype in ("int", "int64"):
+                    schema["properties"][col] = "int32"
+            ogrgmt_kwargs["schema"] = schema


If we're going to need to read the schema anyway, should we just use the pure fiona and no geopandas solution I mentioned at #1000 (comment)? Then we can remove the try-except block and the geopandas import?

Sounds a good idea, although I have no experience with __geo_interface__.

If we're going to need to read the schema anyway, should we just use the pure fiona and no geopandas solution I mentioned at #1000 (comment)? Then we can remove the try-except block and the geopandas import?

I think we can't. Currently, we use the gpd.io.file.infer_schema function to infer the data schema, so we still have to import the geopandas package. More importantly, the gpd.io.file.infer_schema function only works for GeoPandas object.

Yeah, let's just stick to using gpd.io.file.infer_schema, I think we'll have to because the patch made at geopandas/geopandas#2950 is in geopandas only and not fiona.

weiji14 · 2023-07-04T02:43:33Z

pygmt/helpers/tempfile.py

        os.remove(tmpfile.name)  # ensure file is deleted first
        ogrgmt_kwargs = {"filename": tmpfile.name, "driver": "OGR_GMT", "mode": "w"}
        try:
+            # Workaround to map int/int64 to int32


Need to add a unit test to pygmt/tests/test_geopandas.py that uses int/int64 columns. Probably a good idea to check uint/uint64 too.

Probably a good idea to check uint/uint64 too.

See line https://github.com/geopandas/geopandas/blob/35463adda663dfa57c9809865ebbb15448823278/geopandas/io/file.py#L640

out_type = type(np.zeros(1, in_type).item()).__name__

For any integer-like data types (like int, int32, int64, unit, uint64 and more), the out_type is always int.

Ah ok, then a unit test with just int64 columns should be fine, no need for parametrized tests.

@weiji14 I have never used geopandas and I find it challenging for me to add a geopandas test for this PR. Could you please help with it?

Getting a little late for me, but let me post the test example I'm working on:

import geopandas as gpd import pandas as pd from pygmt import Figure, info, which, makecpt shapefile = which( fname="@RidgeTest.shp @RidgeTest.shx @RidgeTest.dbf @RidgeTest.prj", download="c", ) gdf = gpd.read_file(shapefile[0]) gdf["geometry"] = gdf.to_crs(crs="EPSG:3857").buffer(distance=100000) gdf["NPOINTS"] = gdf.NPOINTS.astype(dtype="int32") gdf.plot() fig = Figure() makecpt(cmap="lajolla", series=[10, 60, 10], continuous=True) fig.plot( data=gdf[["NPOINTS", "geometry"]], projection="X5c", frame=True, pen="1p,black", close=True, fill="+z", cmap=True, aspatial="Z=NPOINTS", ) fig.colorbar() fig.show()

Couldn't get it to plot though, getting some weird errors like:

plot [ERROR]: Bad OGR/GMT: @n not allowed before FEATURE_DATA plot [ERROR]: Aspatial associations set with -a but input file is not in OGR/GMT format!

Might be another bug? I'll try to come up with an alternative example tomorrow.

pygmt-2m6nvbqu.gmt.txt

This is the temporary OGR/GMT file generated by tempfile_from_geojson (with an extra .txt suffix so that I can upload it to GitHub).

The first few lines are:

# @VGMT1.0 @GPOLYGON # @R-4506555.399/-3052902.89568/3804018.88695/5612191.52675 # @Je3857 # @Jp"+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs" # @Jw"PROJCS[\"WGS 84 / Pseudo-Mercator\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Mercator_1SP\"],PARAMETER[\"central_meridian\",0],PARAMETER[\"scale_factor\",1],PARAMETER[\"false_easting\",0],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],EXTENSION[\"PROJ4\",\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs\"],AUTHORITY[\"EPSG\",\"3857\"]]" # @NNPOINTS # @Tstring # FEATURE_DATA >

Apparently, GMT incorrectly parse the @null string in the @Jp and @Jw lines as an invalid OGR/GMT code. So it looks like an upstream bug.

Edit: opened a bug report in GenericMappingTools/gmt#7712.

Oo, didn't notice that @null. I see at https://proj.org/en/9.2/usage/transformation.html#the-null-grid that you can specify null, by why would there be an @ symbol?!! A little bit of searching in the https://github.com/OSGeo/PROJ repo shows that this might be valid though?

Do we need to report this to GMT, geopandas or PROJ? In the meantime, I can try a projection that doesn't have @null in the PROJ string.

I see at https://proj.org/en/9.2/usage/transformation.html#the-null-grid that you can specify null, by why would there be an @ symbol?!!

I think @ means the grid is optional (see https://github.com/OSGeo/PROJ/blob/0372c81511a11afa76e01ebb30adb00bcb950772/NEWS#L2936). Also see the search results of =@: https://github.com/search?q=repo%3AOSGeo%2FPROJ+%3D%40&type=code

Ok, I've added the unit test at commit c1b3b43, and used a projection that doesn't have @null, feel free to modify if needed.

pygmt-2m6nvbqu.gmt.txt

This is the temporary OGR/GMT file generated by tempfile_from_geojson (with an extra .txt suffix so that I can upload it to GitHub).

The first few lines are:

# @VGMT1.0 @GPOLYGON # @R-4506555.399/-3052902.89568/3804018.88695/5612191.52675 # @Je3857 # @Jp"+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs" # @Jw"PROJCS[\"WGS 84 / Pseudo-Mercator\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Mercator_1SP\"],PARAMETER[\"central_meridian\",0],PARAMETER[\"scale_factor\",1],PARAMETER[\"false_easting\",0],PARAMETER[\"false_northing\",0],UNIT[\"metre\",1],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH],EXTENSION[\"PROJ4\",\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs\"],AUTHORITY[\"EPSG\",\"3857\"]]" # @NNPOINTS # @Tstring # FEATURE_DATA >

Apparently, GMT incorrectly parse the @null string in the @Jp and @Jw lines as an invalid OGR/GMT code. So it looks like an upstream bug.

Edit: opened a bug report in GenericMappingTools/gmt#7712.

FYI, the upstream bug has been fixed.

weiji14 · 2023-07-04T02:52:15Z

pygmt/helpers/tempfile.py

+            # Workaround to map int/int64 to int32
+            # https://github.com/geopandas/geopandas/issues/967#issuecomment-842877704
+            # https://github.com/GenericMappingTools/pygmt/issues/2497
+            schema = gpd.io.file.infer_schema(geojson)


Hmm, looking into this infer_schema function and geopandas/geopandas#967 (comment) a bit more, I almost think part of the fix should go upstream into modifying infer_schema to output int32 values when the column is of int32 dtype, instead of forcing things to int/int64.

Of course, we'll still need to force the OGR_GMT output to int32 in PyGMT, but the code would be much simplified with some upstream fixes.

Hmm, looking into this infer_schema function and geopandas/geopandas#967 (comment) a bit more, I almost think part of the fix should go upstream into modifying infer_schema to output int32 values when the column is of int32 dtype, instead of forcing things to int/int64.

There is already an open PR geopandas/geopandas#2265, but it has been inactive for one year.

There is already an open PR geopandas/geopandas#2265, but it has been inactive for one year.

Ok, and I saw geopandas/geopandas#2464 as well, also inactive 😆

Started a small PR at geopandas/geopandas#2950 to just handle int32, let's see if their test suite passes, I took a look at https://fiona.readthedocs.io/en/latest/manual.html#keeping-schemas-simple and fiona.FIELD_TYPES_MAP seem to have the following mapping:

{'int32': int, 'float': float, 'str': str, 'date': fiona.rfc3339.FionaDateType, 'time': fiona.rfc3339.FionaTimeType, 'datetime': fiona.rfc3339.FionaDateTimeType, 'bytes': bytes, 'int64': int, 'int': int, 'List[str]': typing.List[str]}

So not sure if the geopandas default driver (ESRI Shapefile?) supports int32 schema. It it doesn't and the test suite breaks, then we might need to handle everything fully in PyGMT.

It's still unclear if there will be any changes in GDAL, GeoPandas, Fiona or GMT, but as we are still using old versions of these packages, I believe we still need these workarounds in the next few years, right?

So the patch in geopandas has been merged at geopandas/geopandas#2950, but will need to wait for geopandas v0.13.3 or newer to be released. Even so, int64 will still need to be converted to int32, so yes, we'll need to keep this workaround for a while.

Using the @RidgeTest.shp dataset that has been buffered, and colouring the polygons with a different colormap.

github-actions · 2023-08-11T04:52:07Z

Summary of changed images

This is an auto-generated report of images that have changed on the DVC remote

Status	Path
added	pygmt/tests/baseline/test_geopandas_plot_int_dtypes.png

Image diff(s)

Added images

test_geopandas_plot_int_dtypes.png

Modified images

Path	Old	New

Report last updated at commit a4ddcfe

weiji14 · 2023-08-11T05:04:24Z

pygmt/tests/test_geopandas.py

+    gdf["NPOINTS"] = gdf.NPOINTS.astype(dtype=dtype)
+
+    # Plot figure with three polygons colored based on NPOINTS value
+    fig = Figure()
+    makecpt(cmap="lajolla", series=[10, 60, 10], continuous=True)


We might need to check what happens when a big int64 number $2^{32} > x > 2^{64}$ is passed in. Don't want to accidentally convert and plot data incorrectly!

@seisman, could you add a test for a large int64 number too? Can use a simple info just to check the cast to int32 behaviour, want to make sure it doesn't truncate int64 numbers to int32.

Modified the existing test with large numbers:

import geopandas as gpd from pygmt import which, info shapefile = which( fname="@RidgeTest.shp @RidgeTest.shx @RidgeTest.dbf @RidgeTest.prj", download="c", ) gdf = gpd.read_file(shapefile[0]) gdf["geometry"] = ( gdf.to_crs(crs="EPSG:3857") .buffer(distance=100000) .to_crs(crs="OGC:CRS84") # convert to lon/lat to prevent @null in PROJ CRS ) gdf["NPOINTS"] *= 1.0e10 gdf["NPOINTS"] = gdf.NPOINTS.astype(dtype="int64") info(gdf)

It raises an OverflowError error:

--------------------------------------------------------------------------- OverflowError Traceback (most recent call last) Cell In[12], line 1 ----> 1 info(gdf) File ~/OSS/gmt/pygmt/pygmt/helpers/decorators.py:598, in use_alias.<locals>.alias_decorator.<locals>.new_module(*args, **kwargs) 591 msg = ( 592 "Parameters 'Y' and 'yshift' are deprecated since v0.8.0. " 593 "and will be removed in v0.12.0. " 594 "Use Figure.shift_origin(yshift=...) instead." 595 ) 596 warnings.warn(msg, category=SyntaxWarning, stacklevel=2) --> 598 return module_func(*args, **kwargs) File ~/OSS/gmt/pygmt/pygmt/helpers/decorators.py:738, in kwargs_to_strings.<locals>.converter.<locals>.new_module(*args, **kwargs) 736 kwargs[arg] = separators[fmt].join(f"{item}" for item in value) 737 # Execute the original function and return its output --> 738 return module_func(*args, **kwargs) File ~/OSS/gmt/pygmt/pygmt/src/info.py:85, in info(data, **kwargs) 83 file_context = lib.virtualfile_from_data(check_kind="vector", data=data) 84 with GMTTempFile() as tmpfile: ---> 85 with file_context as fname: 86 lib.call_module( 87 module="info", 88 args=build_arg_string(kwargs, infile=fname, outfile=tmpfile.name), 89 ) 90 result = tmpfile.read() File ~/opt/miniconda/lib/python3.10/contextlib.py:135, in _GeneratorContextManager.__enter__(self) 133 del self.args, self.kwds, self.func 134 try: --> 135 return next(self.gen) 136 except StopIteration: 137 raise RuntimeError("generator didn't yield") from None File ~/OSS/gmt/pygmt/pygmt/helpers/tempfile.py:147, in tempfile_from_geojson(geojson) 145 ogrgmt_kwargs["schema"] = schema 146 # Using geopandas.to_file to directly export to OGR_GMT format --> 147 geojson.to_file(**ogrgmt_kwargs) 148 except AttributeError: 149 # Other 'geo' formats which implement __geo_interface__ 150 import json File ~/opt/miniconda/lib/python3.10/site-packages/geopandas/geodataframe.py:1263, in GeoDataFrame.to_file(self, filename, driver, schema, index, **kwargs) 1172 """Write the ``GeoDataFrame`` to a file. 1173 1174 By default, an ESRI shapefile is written, but any OGR data source (...) 1259 1260 """ 1261 from geopandas.io.file import _to_file -> 1263 _to_file(self, filename, driver, schema, index, **kwargs) File ~/opt/miniconda/lib/python3.10/site-packages/geopandas/io/file.py:572, in _to_file(df, filename, driver, schema, index, mode, crs, engine, **kwargs) 569 raise ValueError(f"'mode' should be one of 'w' or 'a', got '{mode}' instead") 571 if engine == "fiona": --> 572 _to_file_fiona(df, filename, driver, schema, crs, mode, **kwargs) 573 elif engine == "pyogrio": 574 _to_file_pyogrio(df, filename, driver, schema, crs, mode, **kwargs) File ~/opt/miniconda/lib/python3.10/site-packages/geopandas/io/file.py:601, in _to_file_fiona(df, filename, driver, schema, crs, mode, **kwargs) 597 crs_wkt = crs.to_wkt("WKT1_GDAL") 598 with fiona.open( 599 filename, mode=mode, driver=driver, crs_wkt=crs_wkt, schema=schema, **kwargs 600 ) as colxn: --> 601 colxn.writerecords(df.iterfeatures()) File ~/opt/miniconda/lib/python3.10/site-packages/fiona/collection.py:558, in Collection.writerecords(self, records) 556 if self.mode not in ("a", "w"): 557 raise OSError("collection not open for writing") --> 558 self.session.writerecs(records, self) 559 self._len = self.session.get_length() 560 self._bounds = None File fiona/ogrext.pyx:1409, in fiona.ogrext.WritingSession.writerecs() File fiona/ogrext.pyx:441, in fiona.ogrext.OGRFeatureBuilder.build() OverflowError: value too large to convert to int

Ok, good to know that it errors. We should probably cast int64 numbers to float64 or string dtype, xref geopandas/geopandas#2950 (comment) and https://github.com/OSGeo/gdal/blob/6e9103bd5acb1ff8da305c4e77fa30335ffa3b70/ogr/ogrsf_frmts/gmt/ogrgmtlayer.cpp#L754-L773. We can do this in a follow-up PR since this one is getting a bit long.

weiji14 · 2023-08-11T05:10:46Z

Hmm, there are a few failing tests on Python 3.11 but not Python 3.9. See https://github.com/GenericMappingTools/pygmt/actions/runs/5829032071/job/15807827814?pr=2592#step:8:738

_______________________ test_geopandas_info_geodataframe _______________________

gdf =                                                        geometry
multipolygon  MULTIPOLYGON (((0.00000 0.00000, 20.0000...      POLYGON ((20.00000 10.00000, 23.00000 10.00000...
linestring    LINESTRING (20.00000 15.00000, 30.00000 15.00000)

    def test_geopandas_info_geodataframe(gdf):
        """
        Check that info can return the bounding box region from a
        geopandas.GeoDataFrame.
        """
>       output = info(data=gdf, per_column=True)

../pygmt/tests/test_geopandas.py:48: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../pygmt/helpers/decorators.py:598: in new_module
    return module_func(*args, **kwargs)
../pygmt/helpers/decorators.py:738: in new_module
    return module_func(*args, **kwargs)
../pygmt/src/info.py:85: in info
    with file_context as fname:
../../../../micromamba/envs/pygmt/lib/python3.11/contextlib.py:137: in __enter__
    return next(self.gen)
../pygmt/helpers/tempfile.py:144: in tempfile_from_geojson
    geojson.to_file(**ogrgmt_kwargs)
../../../../micromamba/envs/pygmt/lib/python3.11/site-packages/geopandas/geodataframe.py:1263: in to_file
    _to_file(self, filename, driver, schema, index, **kwargs)
../../../../micromamba/envs/pygmt/lib/python3.11/site-packages/geopandas/io/file.py:572: in _to_file
    _to_file_fiona(df, filename, driver, schema, crs, mode, **kwargs)
../../../../micromamba/envs/pygmt/lib/python3.11/site-packages/geopandas/io/file.py:601: in _to_file_fiona
    colxn.writerecords(df.iterfeatures())
../../../../micromamba/envs/pygmt/lib/python3.11/site-packages/fiona/collection.py:558: in writerecords
    self.session.writerecs(records, self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   ValueError: Record does not match collection schema: KeysView(<fiona.model.Properties object at 0x7f3265f03a10>) != []

fiona/ogrext.pyx:1397: ValueError
____________________ test_geopandas_plot_int_dtypes[int32] _____________________
Error: Image files did not match.
  RMS Value: 62.49494379761804
  Expected:  
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int32/baseline.png
  Actual:    
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int32/result.png
  Difference:
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int32/result-failed-diff.png
  Tolerance: 
    2
____________________ test_geopandas_plot_int_dtypes[int64] _____________________
Error: Image files did not match.
  RMS Value: 62.49494379761804
  Expected:  
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int64/baseline.png
  Actual:    
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int64/result.png
  Difference:
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int64/result-failed-diff.png
  Tolerance: 
    2
____________________ test_geopandas_plot_int_dtypes[dtype2] ____________________

args = (), kwargs = {'dtype': Int32Dtype()}

    def wrapper(*args, **kwargs):
>       store.return_value[test_name] = obj(*args, **kwargs)

../../../../micromamba/envs/pygmt/lib/python3.11/site-packages/pytest_mpl/plugin.py:107: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../pygmt/tests/test_geopandas.py:165: in test_geopandas_plot_int_dtypes
    fig.plot(
../pygmt/helpers/decorators.py:818: in new_module
    return module_func(*args, **kwargs)
../pygmt/helpers/decorators.py:598: in new_module
    return module_func(*args, **kwargs)
../pygmt/helpers/decorators.py:738: in new_module
    return module_func(*args, **kwargs)
../pygmt/src/plot.py:266: in plot
    with file_context as fname:
../../../../micromamba/envs/pygmt/lib/python3.11/contextlib.py:137: in __enter__
    return next(self.gen)
../pygmt/helpers/tempfile.py:138: in tempfile_from_geojson
    schema = gpd.io.file.infer_schema(geojson)
../../../../micromamba/envs/pygmt/lib/python3.11/site-packages/geopandas/io/file.py:646: in infer_schema
    [
../../../../micromamba/envs/pygmt/lib/python3.11/site-packages/geopandas/io/file.py:647: in <listcomp>
    (col, convert_type(col, _type))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

column = 'NPOINTS', in_type = Int32Dtype()

    def convert_type(column, in_type):
        if in_type == object:
            return "str"
        if in_type.name.startswith("datetime64"):
            # numpy datetime type regardless of frequency
            return "datetime"
        if str(in_type) in types:
            out_type = types[str(in_type)]
        else:
>           out_type = type(np.zeros(1, in_type).item()).__name__
E           TypeError: Cannot interpret 'Int32Dtype()' as a data type

../../../../micromamba/envs/pygmt/lib/python3.11/site-packages/geopandas/io/file.py:640: TypeError
____________________ test_geopandas_plot_int_dtypes[dtype3] ____________________
Error: Image files did not match.
  RMS Value: 62.49494379761804
  Expected:  
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_dtype3/baseline.png
  Actual:    
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_dtype3/result.png
  Difference:
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_dtype3/result-failed-diff.png
  Tolerance: 
    2

Looks like some version related incompatibilities...

seisman · 2023-08-11T05:39:20Z

Hmm, there are a few failing tests on Python 3.11 but not Python 3.9.

GeoPandas is an optional dependency and it's not installed in the Python 3.9 job so geopandas tests are skipped.

seisman · 2023-08-19T02:54:04Z

____________________ test_geopandas_plot_int_dtypes[int32] _____________________
Error: Image files did not match.
  RMS Value: 62.49494379761804
  Expected:  
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int32/baseline.png
  Actual:    
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int32/result.png
  Difference:
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int32/result-failed-diff.png
  Tolerance: 
    2
____________________ test_geopandas_plot_int_dtypes[int64] _____________________
Error: Image files did not match.
  RMS Value: 62.49494379761804
  Expected:  
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int64/baseline.png
  Actual:    
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int64/result.png
  Difference:
    /home/runner/work/pygmt/pygmt/tmp-test-dir-with-unique-name/results/pygmt.tests.test_geopandas.test_geopandas_plot_int_dtypes_int64/result-failed-diff.png
  Tolerance: 
    2

These two tests fail because the baseline image is incorrect.

Baseline	Result

The left-side plot is the current baseline image. As you can see, the colorbar is a reversed version of the standard lajolla colorbar (https://docs.generic-mapping-tools.org/latest/cookbook/cpts.html). After fixing the baseline image, the two tests should pass, but I'm having issues uploading files to DVC, so I can't help here.

weiji14 · 2023-08-19T03:41:14Z

The left-side plot is the current baseline image. As you can see, the colorbar is a reversed version of the standard lajolla colorbar (https://docs.generic-mapping-tools.org/latest/cookbook/cpts.html). After fixing the baseline image, the two tests should pass, but I'm having issues uploading files to DVC, so I can't help here.

Hmm, I was using Scientific Colour Map 8 from GMT 6.5 dev, and looking at GenericMappingTools/gmt#7575, it seems like the lajolla.cpt colours have flipped around?

GMT 6.4 SCM v7 https://docs.generic-mapping-tools.org/6.4/cookbook/cpts.html#id3

GMT 6.5 dev SCM v8 https://docs.generic-mapping-tools.org/dev/cookbook/cpts.html#id3

Let me pick another colourmap 😅

The lajolla colormap is flipped from Scientific Colour Maps v7 to v8, causing differences in the baseline image.

Need patch from geopandas/geopandas#2950 before we can use nullable dtypes pd.Int32Dtype() and pd.Int64Dtype().

weiji14 · 2023-08-19T03:56:15Z

pygmt/tests/test_geopandas.py

+        # Enable Int32/Int64 dtypes when geopandas>=0.13.3 is released with
+        # patch https://github.com/geopandas/geopandas/pull/2950
+        # pd.Int32Dtype(),
+        # pd.Int64Dtype(),


E TypeError: Cannot interpret 'Int32Dtype()' as a data type

Disabling testing of these nullable dtypes for now, until geopandas>=0.13.3 is out with the patch from geopandas/geopandas#2950.

weiji14 · 2023-08-19T04:14:06Z

pygmt/helpers/tempfile.py

+            for col, dtype in schema["properties"].items():
+                if dtype in ("int", "int64"):
+                    schema["properties"][col] = "int32"
+            ogrgmt_kwargs["schema"] = schema


Only write schema if schema["properties"] is not None. Should fix that error about ValueError: Record does not match collection schema: KeysView(<fiona.model.Properties object at 0x7f3265f03a10>) != []

Suggested change

ogrgmt_kwargs["schema"] = schema

if schema["properties"]:

ogrgmt_kwargs["schema"] = schema

I think it's not the root cause of the error.

Here is a modified version of the test_geopandas_info_geodataframe test to understand why it fails:

import geopandas as gpd import shapely linestring = shapely.geometry.LineString([(20, 15), (30, 15)]) polygon = shapely.geometry.Polygon([(20, 10), (23, 10), (23, 14), (20, 14)]) multipolygon = shapely.geometry.shape( { "type": "MultiPolygon", "coordinates": [ [ [[0, 0], [20, 0], [10, 20], [0, 0]], # Counter-clockwise [[3, 2], [10, 16], [17, 2], [3, 2]], # Clockwise ], [[[6, 4], [14, 4], [10, 12], [6, 4]]], # Counter-clockwise [[[25, 5], [30, 10], [35, 5], [25, 5]]], ], } ) # Multipolygon first so the OGR_GMT file has @GMULTIPOLYGON in the header gdf = gpd.GeoDataFrame( index=["multipolygon", "polygon", "linestring"], geometry=[multipolygon, polygon, linestring], ) schema = gpd.io.file.infer_schema(gdf) print(gdf) print(schema) gdf.to_file("test.gmt", driver="OGR_GMT", mode="w")

The output of the two print statements is:

geometry multipolygon MULTIPOLYGON (((0.00000 0.00000, 20.00000 0.00... polygon POLYGON ((20.00000 10.00000, 23.00000 10.00000... linestring LINESTRING (20.00000 15.00000, 30.00000 15.00000) {'geometry': ['MultiPolygon', 'Polygon', 'LineString'], 'properties': OrderedDict()}

To understand what geopandas/fiona does, I also added the same two print statements to source code geopandas/io/file.py at line https://github.com/geopandas/geopandas/blob/b4b10313ab57bf2c55592a28fb99687c9a538fc2/geopandas/io/file.py#L591.

The two print statements in the geopandas source code give:

index geometry 0 multipolygon MULTIPOLYGON (((0.00000 0.00000, 20.00000 0.00... 1 polygon POLYGON ((20.00000 10.00000, 23.00000 10.00000... 2 linestring LINESTRING (20.00000 15.00000, 30.00000 15.00000) {'geometry': ['MultiPolygon', 'Polygon', 'LineString'], 'properties': OrderedDict([('index', 'str')])}

As you can see, both the gdf object and the schema are different. This is because geopandas by defaults modifies the index column (see lines https://github.com/geopandas/geopandas/blob/b4b10313ab57bf2c55592a28fb99687c9a538fc2/geopandas/io/file.py#L551-L556).

In our case, our inferred schema has a empty property dict OrderedDict(), but geopandas/fiona expects a property dict like OrderedDict([('index', 'str')]).

https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_file.html#geopandas.GeoDataFrame.to_file

index: bool, default None

If True, write index into one or more columns (for MultiIndex). Default None writes the index into one or more columns only if the index is named, is a MultiIndex, or has a non-integer data type. If False, no index is written.

I think writing index into one column or not has no effects to the output of OGR_GMT driver, so maybe we should just set index=False.

I think writing index into one column or not has no effects to the output of OGR_GMT driver, so maybe we should just set index=False.

Setting index=True (the default) will produce this in the output *.gmt file:

# @VGMT1.0 # @R0/35/0/20 # @GMULTIPOLYGON # @Nindex # @Tstring # FEATURE_DATA > # @Dmultipolygon # @P 0 0 20 0 10 20 0 0 > # @H 3 2 10 16 17 2 3 2 > # @P 6 4 14 4 10 12 6 4 > # @P 25 5 30 10 35 5 25 5 > # @Dpolygon # @P 20 10 23 10 23 14 20 14 20 10 > # @Dlinestring 20 15 30 15

Notice the lines with the comments # @Nindex and # @Dmultipolygon which are from the index. So the 'index' is output as just a regular column in the OGR_GMT file, and could potentially be used by someone doing aspatial="Z=index". We probably do need to keep it just in case.

Is there another way to handle this? E.g. cast the 'index' column explicitly to a string type?

Is there another way to handle this? E.g. cast the 'index' column explicitly to a string type?

This won't always work, because the index property in a GeoDataFrame object can be integers. For example, in the above example, the object can be defined like

gdf = gpd.GeoDataFrame( #index=["multipolygon", "polygon", "linestring"], index=[0, 1, 2], geometry=[multipolygon, polygon, linestring], )

if index: df = df.reset_index(drop=False)

Probably just need these two lines, call reset_index if an index exists?

https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_file.html#geopandas.GeoDataFrame.to_file

index has a default value of None. So df.reset_index is only called if the GeoDataFrame object's index is named or is non-integer.

Ah sorry, I thought index was referring to the df.index attribute. We could do something like this to force an index name:

if gdf.index.name is None: gdf.index.name = "index" # print(gdf.index.names) # FrozenList(['index'])

It looks good.

Ah sorry, I thought index was referring to the df.index attribute. We could do something like this to force an index name:

if gdf.index.name is None: gdf.index.name = "index" # print(gdf.index.names) # FrozenList(['index'])

Patch applied in 2097add.

weiji14

Approving because the patch here should resolve the bug described in #2497 already. We can open up a follow-up PR to handle large int64 numbers from raising an OverflowError as mentioned at #2592 (comment) later.

Also, once geopandas>=0.13.3 is out, we should enable the nullable int dtype tests as mentioned at #2592 (comment).

geopandas: Mapping int/int64 to int32 for OGR_GMT format

8cec0c4

seisman mentioned this pull request Jul 4, 2023

Figure.plot: parameter aspatial does not accept integers for filling polygons #2497

Closed

weiji14 reviewed Jul 4, 2023

View reviewed changes

weiji14 mentioned this pull request Jul 4, 2023

BUG: Infer schema for np.int32 and pd.Int32Dtype columns properly geopandas/geopandas#2950

Merged

seisman marked this pull request as draft July 4, 2023 03:23

weiji14 added the bug Something isn't working label Jul 5, 2023

Merge branch 'main' into ogrgmt-int32

eacf45b

seisman added this to the 0.10.0 milestone Aug 3, 2023

seisman added 4 commits August 3, 2023 09:57

Update the comment

f2f10ca

Merge branch 'main' into ogrgmt-int32

47bc26f

Merge branch 'main' into ogrgmt-int32

f9b7903

Fix a linting issue

84f87a5

seisman mentioned this pull request Aug 11, 2023

OGR/GMT: Error when parsing strings that contain the special @ characters GenericMappingTools/gmt#7712

Closed

weiji14 added 2 commits August 11, 2023 16:49

Add unit test for plotting GeoDataFrame with int32 and int64 columns

c1b3b43

Using the @RidgeTest.shp dataset that has been buffered, and colouring the polygons with a different colormap.

Isort imports

20891c1

weiji14 reviewed Aug 11, 2023

View reviewed changes

weiji14 added 2 commits August 19, 2023 15:49

Change cmap from lajolla to lisbon

60ba52a

The lajolla colormap is flipped from Scientific Colour Maps v7 to v8, causing differences in the baseline image.

Disable Int32 and Int64 dtype tests until geopandas 0.13.3 or greater

a6d4315

Need patch from geopandas/geopandas#2950 before we can use nullable dtypes pd.Int32Dtype() and pd.Int64Dtype().

weiji14 reviewed Aug 19, 2023

View reviewed changes

Remove unused pandas import

1e05a29

weiji14 reviewed Aug 19, 2023

View reviewed changes

seisman added 2 commits August 20, 2023 10:55

Force an index name and reset index

2097add

Merge branch 'main' into ogrgmt-int32

a4ddcfe

seisman marked this pull request as ready for review August 20, 2023 03:29

seisman added the needs review This PR has higher priority and needs review. label Aug 20, 2023

weiji14 approved these changes Aug 21, 2023

View reviewed changes

weiji14 added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Aug 21, 2023

seisman merged commit 5b05b6a into main Aug 21, 2023
18 of 22 checks passed

seisman deleted the ogrgmt-int32 branch August 21, 2023 09:49

seisman removed the final review call This PR requires final review and approval from a second reviewer label Aug 21, 2023

weiji14 mentioned this pull request Nov 30, 2023

Enable testing pd.Int32Dtype/pd.Int64Dtype for geopandas integration #2839

Merged

seisman mentioned this pull request Dec 1, 2023

geopandas: Correctly handle columns with integer values bigger than the largest 32-bit integer #2841

Merged

	ogrgmt_kwargs["schema"] = schema
	if schema["properties"]:
	ogrgmt_kwargs["schema"] = schema

geopandas: Mapping int/int64 to int32 for OGR_GMT format #2592

geopandas: Mapping int/int64 to int32 for OGR_GMT format #2592

Conversation

seisman commented Jul 4, 2023

weiji14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weiji14 Aug 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seisman Aug 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weiji14 Aug 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seisman Jul 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Aug 11, 2023 • edited Loading

Summary of changed images

Image diff(s)

Added images

Modified images

Choose a reason for hiding this comment

weiji14 Aug 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weiji14 commented Aug 11, 2023

seisman commented Aug 11, 2023

seisman commented Aug 19, 2023

weiji14 commented Aug 19, 2023

weiji14 Aug 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weiji14 Aug 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seisman Aug 20, 2023 • edited Loading

Choose a reason for hiding this comment

weiji14 left a comment • edited Loading

Choose a reason for hiding this comment

weiji14 Aug 9, 2023 •

edited

Loading

seisman Aug 11, 2023 •

edited

Loading

weiji14 Aug 11, 2023 •

edited

Loading

seisman Jul 4, 2023 •

edited

Loading

github-actions bot commented Aug 11, 2023 •

edited

Loading

weiji14 Aug 20, 2023 •

edited

Loading

weiji14 Aug 19, 2023 •

edited

Loading

weiji14 Aug 19, 2023 •

edited

Loading

seisman Aug 20, 2023 •

edited

Loading

weiji14 left a comment •

edited

Loading