-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change dot notation in add column documentation to tuple #1433
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jeppe-dos for fixing this 🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there might be a bug with this change. I tried to follow the docs
from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType, DoubleType, LongType
warehouse_path = "/tmp/warehouse"
catalog = SqlCatalog(
"default",
**{
"uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
"warehouse": f"file://{warehouse_path}",
},
)
schema = Schema(
NestedField(1, "city", StringType(), required=False),
NestedField(2, "lat", DoubleType(), required=False),
NestedField(3, "long", DoubleType(), required=False),
)
catalog.create_namespace_if_not_exists("default")
try:
catalog.drop_table("default.locations")
except:
pass
table = catalog.create_table("default.locations", schema)
# with table.update_schema() as update:
# # In a struct
# update.add_column("details.confirmed_by", StringType(), "Name of the exchange")
with table.update_schema() as update:
update.add_column(("details", "confirmed_by"), StringType(), "Name of the exchange")
errors
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/Users/kevinliu/repos/iceberg-python/pyiceberg/table/update/schema.py", line 192, in add_column
parent_field = self._schema.find_field(parent_full_path, self._case_sensitive)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kevinliu/repos/iceberg-python/pyiceberg/schema.py", line 215, in find_field
raise ValueError(f"Could not find field with name {name_or_id}, case_sensitive={case_sensitive}")
ValueError: Could not find field with name details, case_sensitive=True
Heres where the errors happens iceberg-python/pyiceberg/table/update/schema.py Lines 184 to 192 in b0ea716
And some debugging statements:
is where it errors. Seems like we're missing the case where no "parent" is present |
Yes, the struct has to exist before you can insert anything into it. This can be adjusted in the code to automatically create the parent. For now, it is detailed in the documentation changes. Should I write more explicitly? |
ah i see, that makes sense. in that case, can we edit the example so that it works out of the box? Also i think its valuable to move the comment to the top level docs of "Add Column". We can include both the details about dot notation and struct parent |
i found another dot notion in |
A tuple must be used to make columns in structs as described in add_column:
"Because "." may be interpreted as a column path separator or may be used in field names, it is not allowed to add nested column by passing in a string. To add to nested structures or to add fields with names that contain "." use a tuple instead to indicate the path."
This PR corrects the documentation to use tuples instead of dot notation.
From issue 1407