Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy 2: Stop skipping some non-type tests #247

Merged
merged 15 commits into from
Oct 13, 2023

Conversation

susodapop
Copy link
Contributor

@susodapop susodapop commented Oct 12, 2023

Note to reviewers: The important thing here is that all these tests pass. The nitty gritty implementation of SQLAlchemy is both esoteric and unchanging. The key things to review are the implementation of get_pk_constraint, get_primary_keys, and the modified infer_types logic.

Background

On the path to full SQLAlchemy 2 compliance, I'm working through the entire dialect compliance test suite. Once all the compliance tests have been addressed as either passing or skipped with justification, the new dialect will be ready for release.

As of this PR, we pass 198 reviewed tests, skip 58 reviewed tests, and have not reviewed 1215 tests.

Description

This pull request represents several blocks of tests that I've reviewed in detail. I'm opening this PR as a partial effort as there are still more tests to consider. But I'd like my PR's to be smaller and more frequent rather than enormous and hard-to-review. Here is a list of the tests I enabled and anything relevant for this review.

  • TableDDLTest No changes to the dialect were needed to make these pass. The table comment DDL tests are marked skip as this is not an acceptance criteria for the next release.
  • HasIndexTest: Databricks doesn't support indexes. These tests are permanently skipped.
  • QuotedNameArgumentTest: These tests are wacky as they are so intermingled with different requirements, some of which Databricks supports, and others which it doesn't. There's no way for us to run them as-is. We can tag this as a follow-up after the initial release.
  • RowFetchTest: No changes to the dialect were needed to make this pass.
  • FetchLimitOffsetTest: This failed because the default approach wrote invalid Databricks SQL (it uses -1 as the wildcard match rather than ALL). I had to implement our own sub-class of the SQLCompiler which is captured by SQLAlchemy's visitor API.
  • FutureTableDDLTest: No changes to the dialect were required to make these tests pass. But I had to update requirements.py so that SQLAlchemy's test_runner would know they are runnable.
  • IdentityColumnTest: I implemented IDENTITY writing semantics in the dialect to make these pass.
  • HasTableTest: I had to make a few changes to bring these into passing state. I also added the reflection.cache decorator to our implementation of has_table() so that the connector will benefit from SQLAlchemy's inspector cache. This is the customary approach for first-party and third-party dialects alike. SQLAlchemy takes care of cache-invalidation for us.
  • LongNameBlowoutTest: Databricks SQL doesn't support identifiers longer than 255 characters so these tests are permanently skipped.
  • ExceptionTest: There is only one test in this block and it checks for an integrity error to be raised if a primary key constraint is violated. Databricks SQL doesn't enforce information constraints (apart from CHECK constraints, which are not handled inthis PR). So this is permanently skipped.
  • LastrowidTest: This test depends on us implementing implicit autoincrement behaviour (where we write out an IDENTITY constraint when the user didn't specify one explicitly). This behaviour is not an acceptance criteria for the first release. The test is therefore disabled. I did add a warning log message in case users attempt to use this. The workaround is to explicitly declare Identity() for a column -- other dialects have made this similarly explicit.
  • CompositeKeyReflectionTest: This is the big one: foreign and primary key reflection is a significant gap in the existing Databricks SQLAlchemy dialect. To make these tests pass I implemented the methods SQLAlchemy needs to reflect the PRIMARY KEY and FOREIGN KEY constraints defined on a given table.

Stop inferring TINYINT

After discussion with @benc-db I've also modified the pysql infer_types method so that it never automatically infers a TINYINT type. We did this because non-SELECT clauses that accept bound integer inputs (like LIMIT and OFFSET) will never accept a value that was casted to TINYINT. If users of pysql require the efficiency of a TINYINT, they can explicitly declare this type when binding their parameters.

Approach

This PR takes the same approach as #242. As I review each block of tests I mark it with @pytest.mark.reviewed. If I find that the test depends on a method existing in requirements.py, I update that file with its specific requirements. If the test cannot pass within reason for Databricks, I add it directly to test_suite.py with a rational skip marker. This means that when running pytest -m reviewed the output will look like this:

test_suite.py::IdentityColumnTest_databricks+databricks::test_select_all SKIPPED (Identity works. Test needs rewrite for Databricks. See comments in test_suite.py)
test_suite.py::IdentityColumnTest_databricks+databricks::test_select_columns SKIPPED (Identity works. Test needs rewrite for Databricks. See comments in test_suite.py)
test_suite.py::LastrowidTest_databricks+databricks::test_autoincrement_on_insert SKIPPED (This dialect does not support implicit autoincrement. See comments in test_suite.py)
test_suite.py::LastrowidTest_databricks+databricks::test_last_inserted_id SKIPPED (This dialect does not support implicit autoincrement. See comments in test_suite.py)
test_suite.py::LastrowidTest_databricks+databricks::test_native_lastrowid_autoinc SKIPPED (This dialect does not support implicit autoincrement. See comments in test_suite.py)
test_suite.py::LongNameBlowoutTest_databricks+databricks::test_long_convention_name SKIPPED (Databricks constraint names are limited to 255 characters)
test_suite.py::NumericTest_databricks+databricks::test_decimal_coerce_round_trip PASSED
test_suite.py::NumericTest_databricks+databricks::test_decimal_coerce_round_trip_w_cast PASSED
test_suite.py::NumericTest_databricks+databricks::test_enotation_decimal SKIPPED (Databricks doesn't support E notation for DECIMAL types)
test_suite.py::NumericTest_databricks+databricks::test_enotation_decimal_large SKIPPED (Databricks doesn't support E notation for DECIMAL types)
test_suite.py::NumericTest_databricks+databricks::test_float_as_decimal PASSED
test_suite.py::NumericTest_databricks+databricks::test_float_as_float PASSED
test_suite.py::NumericTest_databricks+databricks::test_float_coerce_round_trip SKIPPED (Without a specific CAST, Databricks doesn't return floats with same precision that was selected.)
test_suite.py::NumericTest_databricks+databricks::test_float_custom_scale SKIPPED (Databricks sometimes only returns six digits of precision for the generic Float type)
test_suite.py::NumericTest_databricks+databricks::test_float_is_not_numeric[Double] PASSED
test_suite.py::NumericTest_databricks+databricks::test_float_is_not_numeric[Float] PASSED

If it is possible to make a failing test pass, I implement that feature in the dialect.

Jesse Whitehouse added 15 commits October 11, 2023 13:48
We're now in the territory of features that aren't required for sqla2
compat as of pysql==3.0.0 but we may consider adding this in the future.

In this case, table comment reflection needs to be manually implemented.
Index reflection would require hooking into the compiler to reflect
the partition strategy.

test_suite.py::HasIndexTest_databricks+databricks::test_has_index[dialect] SKIPPED (Databricks does not support indexes.)
test_suite.py::HasIndexTest_databricks+databricks::test_has_index[inspector] SKIPPED (Databricks does not support indexes.)
test_suite.py::HasIndexTest_databricks+databricks::test_has_index_schema[dialect] SKIPPED (Databricks does not support indexes.)
test_suite.py::HasIndexTest_databricks+databricks::test_has_index_schema[inspector] SKIPPED (Databricks does not support indexes.)
test_suite.py::TableDDLTest_databricks+databricks::test_add_table_comment SKIPPED (Comment reflection is possible but not implemented in this dialect.)
test_suite.py::TableDDLTest_databricks+databricks::test_create_index_if_not_exists SKIPPED (Databricks does not support indexes.)
test_suite.py::TableDDLTest_databricks+databricks::test_create_table PASSED
test_suite.py::TableDDLTest_databricks+databricks::test_create_table_if_not_exists PASSED
test_suite.py::TableDDLTest_databricks+databricks::test_create_table_schema PASSED
test_suite.py::TableDDLTest_databricks+databricks::test_drop_index_if_exists SKIPPED (Databricks does not support indexes.)
test_suite.py::TableDDLTest_databricks+databricks::test_drop_table PASSED
test_suite.py::TableDDLTest_databricks+databricks::test_drop_table_comment SKIPPED (Comment reflection is possible but not implemented in this dialect.)
test_suite.py::TableDDLTest_databricks+databricks::test_drop_table_if_exists PASSED
test_suite.py::TableDDLTest_databricks+databricks::test_underscore_names PASSED

Signed-off-by: Jesse Whitehouse <[email protected]>
The fixes to DESCRIBE TABLE and visit_xxx were necessary to get to the
point where I could even determine that these tests wouldn't pass.

But those changes are not currently tested in the dialect. If, in the
course of reviewing the remaining tests in the compliance suite, I find
that these visit_xxxx methods are not tested anywhere else then we should
extend test_suite.py with our own tests to confirm the behaviour for
ourselves.

Signed-off-by: Jesse Whitehouse <[email protected]>
The presence of this pytest.ini file is _required_ to establish pytest's
root_path

https://docs.pytest.org/en/7.1.x/reference/customize.html#finding-the-rootdir

Without it, the custom pytest plugin from SQLAlchemy can't read the contents
of setup.cfg which makes none of the tests runnable.

Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Date type work fixed this test failure

Signed-off-by: Jesse Whitehouse <[email protected]>
This allows these SQLAlchemy tests to pass:

test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_bound_limit PASSED
test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_bound_limit_offset PASSED
test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_expr_limit_simple_offset PASSED
test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit PASSED
test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_expr_offset PASSED
test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases0] PASSED
test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases1] PASSED
test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases2] PASSED

This partially reverts the change introduced in #246

Signed-off-by: Jesse Whitehouse <[email protected]>
I implemented our custom DatabricksStatementCompiler so we can override
the default rendering of unbounded LIMIT clauses from `LIMIT -1` to
`LIMIT ALL`

We also explicitly skip the FETCH clause tests since Databricks doesn't
support this syntax.

Blacked all source code here too.

Signed-off-by: Jesse Whitehouse <[email protected]>
Add meaningful skip markers for table comment reflection and indexes

Signed-off-by: Jesse Whitehouse <[email protected]>
This closes #175

Signed-off-by: Jesse Whitehouse <[email protected]>
Adding the @reflection.cache decorator to has_table is necessary to pass
test_has_table_cache

Caching calls to has_table improves the efficiency of the connector

Signed-off-by: Jesse Whitehouse <[email protected]>
Databricks constraint names are limited to 255 characters

Signed-off-by: Jesse Whitehouse <[email protected]>
Black test_suite.py

Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Turns out that none of these can pass for the same reason that the
first two seemed un-runnable in db6f52b

Signed-off-by: Jesse Whitehouse <[email protected]>
dte_dict = {row["col_name"]: row["data_type"] for row in result}
target = [(k, v) for k, v in dte_dict.items() if "FOREIGN KEY" in v]

def extract_constraint_dict_from_target(target):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: If you feel like there is a concept worth extracting to a function, my preference is not to have it encapsulated inside another function, because then you can't unit test it. I would generally only use this approach for simple functions that are slightly larger than a lambda, and otherwise have the function in a scope that I can test separately. The more you can isolate units of work, the simpler your tests can be to validate their behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feedback. I'll push this to a follow-up PR shortly.

@susodapop
Copy link
Contributor Author

Intentionally skipping failing type checks as these will be addressed in follow-ups.

@susodapop susodapop merged commit ef72d74 into main Oct 13, 2023
7 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants