SQLAlchemy 2: Stop skipping some non-type tests #247

susodapop · 2023-10-12T04:23:41Z

Note to reviewers: The important thing here is that all these tests pass. The nitty gritty implementation of SQLAlchemy is both esoteric and unchanging. The key things to review are the implementation of get_pk_constraint, get_primary_keys, and the modified infer_types logic.

Background

On the path to full SQLAlchemy 2 compliance, I'm working through the entire dialect compliance test suite. Once all the compliance tests have been addressed as either passing or skipped with justification, the new dialect will be ready for release.

As of this PR, we pass 198 reviewed tests, skip 58 reviewed tests, and have not reviewed 1215 tests.

Description

This pull request represents several blocks of tests that I've reviewed in detail. I'm opening this PR as a partial effort as there are still more tests to consider. But I'd like my PR's to be smaller and more frequent rather than enormous and hard-to-review. Here is a list of the tests I enabled and anything relevant for this review.

TableDDLTest No changes to the dialect were needed to make these pass. The table comment DDL tests are marked skip as this is not an acceptance criteria for the next release.
HasIndexTest: Databricks doesn't support indexes. These tests are permanently skipped.
QuotedNameArgumentTest: These tests are wacky as they are so intermingled with different requirements, some of which Databricks supports, and others which it doesn't. There's no way for us to run them as-is. We can tag this as a follow-up after the initial release.
RowFetchTest: No changes to the dialect were needed to make this pass.
FetchLimitOffsetTest: This failed because the default approach wrote invalid Databricks SQL (it uses -1 as the wildcard match rather than ALL). I had to implement our own sub-class of the SQLCompiler which is captured by SQLAlchemy's visitor API.
FutureTableDDLTest: No changes to the dialect were required to make these tests pass. But I had to update requirements.py so that SQLAlchemy's test_runner would know they are runnable.
IdentityColumnTest: I implemented IDENTITY writing semantics in the dialect to make these pass.
HasTableTest: I had to make a few changes to bring these into passing state. I also added the reflection.cache decorator to our implementation of has_table() so that the connector will benefit from SQLAlchemy's inspector cache. This is the customary approach for first-party and third-party dialects alike. SQLAlchemy takes care of cache-invalidation for us.
LongNameBlowoutTest: Databricks SQL doesn't support identifiers longer than 255 characters so these tests are permanently skipped.
ExceptionTest: There is only one test in this block and it checks for an integrity error to be raised if a primary key constraint is violated. Databricks SQL doesn't enforce information constraints (apart from CHECK constraints, which are not handled inthis PR). So this is permanently skipped.
LastrowidTest: This test depends on us implementing implicit autoincrement behaviour (where we write out an IDENTITY constraint when the user didn't specify one explicitly). This behaviour is not an acceptance criteria for the first release. The test is therefore disabled. I did add a warning log message in case users attempt to use this. The workaround is to explicitly declare Identity() for a column -- other dialects have made this similarly explicit.
CompositeKeyReflectionTest: This is the big one: foreign and primary key reflection is a significant gap in the existing Databricks SQLAlchemy dialect. To make these tests pass I implemented the methods SQLAlchemy needs to reflect the PRIMARY KEY and FOREIGN KEY constraints defined on a given table.

Stop inferring TINYINT

After discussion with @benc-db I've also modified the pysql infer_types method so that it never automatically infers a TINYINT type. We did this because non-SELECT clauses that accept bound integer inputs (like LIMIT and OFFSET) will never accept a value that was casted to TINYINT. If users of pysql require the efficiency of a TINYINT, they can explicitly declare this type when binding their parameters.

Approach

This PR takes the same approach as #242. As I review each block of tests I mark it with @pytest.mark.reviewed. If I find that the test depends on a method existing in requirements.py, I update that file with its specific requirements. If the test cannot pass within reason for Databricks, I add it directly to test_suite.py with a rational skip marker. This means that when running pytest -m reviewed the output will look like this:

test_suite.py::IdentityColumnTest_databricks+databricks::test_select_all SKIPPED (Identity works. Test needs rewrite for Databricks. See comments in test_suite.py)
test_suite.py::IdentityColumnTest_databricks+databricks::test_select_columns SKIPPED (Identity works. Test needs rewrite for Databricks. See comments in test_suite.py)
test_suite.py::LastrowidTest_databricks+databricks::test_autoincrement_on_insert SKIPPED (This dialect does not support implicit autoincrement. See comments in test_suite.py)
test_suite.py::LastrowidTest_databricks+databricks::test_last_inserted_id SKIPPED (This dialect does not support implicit autoincrement. See comments in test_suite.py)
test_suite.py::LastrowidTest_databricks+databricks::test_native_lastrowid_autoinc SKIPPED (This dialect does not support implicit autoincrement. See comments in test_suite.py)
test_suite.py::LongNameBlowoutTest_databricks+databricks::test_long_convention_name SKIPPED (Databricks constraint names are limited to 255 characters)
test_suite.py::NumericTest_databricks+databricks::test_decimal_coerce_round_trip PASSED
test_suite.py::NumericTest_databricks+databricks::test_decimal_coerce_round_trip_w_cast PASSED
test_suite.py::NumericTest_databricks+databricks::test_enotation_decimal SKIPPED (Databricks doesn't support E notation for DECIMAL types)
test_suite.py::NumericTest_databricks+databricks::test_enotation_decimal_large SKIPPED (Databricks doesn't support E notation for DECIMAL types)
test_suite.py::NumericTest_databricks+databricks::test_float_as_decimal PASSED
test_suite.py::NumericTest_databricks+databricks::test_float_as_float PASSED
test_suite.py::NumericTest_databricks+databricks::test_float_coerce_round_trip SKIPPED (Without a specific CAST, Databricks doesn't return floats with same precision that was selected.)
test_suite.py::NumericTest_databricks+databricks::test_float_custom_scale SKIPPED (Databricks sometimes only returns six digits of precision for the generic Float type)
test_suite.py::NumericTest_databricks+databricks::test_float_is_not_numeric[Double] PASSED
test_suite.py::NumericTest_databricks+databricks::test_float_is_not_numeric[Float] PASSED

If it is possible to make a failing test pass, I implement that feature in the dialect.

We're now in the territory of features that aren't required for sqla2 compat as of pysql==3.0.0 but we may consider adding this in the future. In this case, table comment reflection needs to be manually implemented. Index reflection would require hooking into the compiler to reflect the partition strategy. test_suite.py::HasIndexTest_databricks+databricks::test_has_index[dialect] SKIPPED (Databricks does not support indexes.) test_suite.py::HasIndexTest_databricks+databricks::test_has_index[inspector] SKIPPED (Databricks does not support indexes.) test_suite.py::HasIndexTest_databricks+databricks::test_has_index_schema[dialect] SKIPPED (Databricks does not support indexes.) test_suite.py::HasIndexTest_databricks+databricks::test_has_index_schema[inspector] SKIPPED (Databricks does not support indexes.) test_suite.py::TableDDLTest_databricks+databricks::test_add_table_comment SKIPPED (Comment reflection is possible but not implemented in this dialect.) test_suite.py::TableDDLTest_databricks+databricks::test_create_index_if_not_exists SKIPPED (Databricks does not support indexes.) test_suite.py::TableDDLTest_databricks+databricks::test_create_table PASSED test_suite.py::TableDDLTest_databricks+databricks::test_create_table_if_not_exists PASSED test_suite.py::TableDDLTest_databricks+databricks::test_create_table_schema PASSED test_suite.py::TableDDLTest_databricks+databricks::test_drop_index_if_exists SKIPPED (Databricks does not support indexes.) test_suite.py::TableDDLTest_databricks+databricks::test_drop_table PASSED test_suite.py::TableDDLTest_databricks+databricks::test_drop_table_comment SKIPPED (Comment reflection is possible but not implemented in this dialect.) test_suite.py::TableDDLTest_databricks+databricks::test_drop_table_if_exists PASSED test_suite.py::TableDDLTest_databricks+databricks::test_underscore_names PASSED Signed-off-by: Jesse Whitehouse <[email protected]>

The fixes to DESCRIBE TABLE and visit_xxx were necessary to get to the point where I could even determine that these tests wouldn't pass. But those changes are not currently tested in the dialect. If, in the course of reviewing the remaining tests in the compliance suite, I find that these visit_xxxx methods are not tested anywhere else then we should extend test_suite.py with our own tests to confirm the behaviour for ourselves. Signed-off-by: Jesse Whitehouse <[email protected]>

The presence of this pytest.ini file is _required_ to establish pytest's root_path https://docs.pytest.org/en/7.1.x/reference/customize.html#finding-the-rootdir Without it, the custom pytest plugin from SQLAlchemy can't read the contents of setup.cfg which makes none of the tests runnable. Signed-off-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]>

Date type work fixed this test failure Signed-off-by: Jesse Whitehouse <[email protected]>

This allows these SQLAlchemy tests to pass: test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_bound_limit PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_bound_limit_offset PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_expr_limit_simple_offset PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_expr_offset PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases0] PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases1] PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases2] PASSED This partially reverts the change introduced in #246 Signed-off-by: Jesse Whitehouse <[email protected]>

I implemented our custom DatabricksStatementCompiler so we can override the default rendering of unbounded LIMIT clauses from `LIMIT -1` to `LIMIT ALL` We also explicitly skip the FETCH clause tests since Databricks doesn't support this syntax. Blacked all source code here too. Signed-off-by: Jesse Whitehouse <[email protected]>

Add meaningful skip markers for table comment reflection and indexes Signed-off-by: Jesse Whitehouse <[email protected]>

This closes #175 Signed-off-by: Jesse Whitehouse <[email protected]>

Adding the @reflection.cache decorator to has_table is necessary to pass test_has_table_cache Caching calls to has_table improves the efficiency of the connector Signed-off-by: Jesse Whitehouse <[email protected]>

Databricks constraint names are limited to 255 characters Signed-off-by: Jesse Whitehouse <[email protected]>

Black test_suite.py Signed-off-by: Jesse Whitehouse <[email protected]>

Signed-off-by: Jesse Whitehouse <[email protected]>

Turns out that none of these can pass for the same reason that the first two seemed un-runnable in db6f52b Signed-off-by: Jesse Whitehouse <[email protected]>

benc-db · 2023-10-12T16:36:33Z

src/databricks/sqlalchemy/__init__.py

+        dte_dict = {row["col_name"]: row["data_type"] for row in result}
+        target = [(k, v) for k, v in dte_dict.items() if "FOREIGN KEY" in v]
+
+        def extract_constraint_dict_from_target(target):


Style nit: If you feel like there is a concept worth extracting to a function, my preference is not to have it encapsulated inside another function, because then you can't unit test it. I would generally only use this approach for simple functions that are slightly larger than a lambda, and otherwise have the function in a scope that I can test separately. The more you can isolate units of work, the simpler your tests can be to validate their behavior.

Great feedback. I'll push this to a follow-up PR shortly.

susodapop · 2023-10-13T23:01:51Z

Intentionally skipping failing type checks as these will be addressed in follow-ups.

Jesse Whitehouse added 15 commits October 11, 2023 13:48

Emit a warning for certain constructs

6e81710

Signed-off-by: Jesse Whitehouse <[email protected]>

Stop skipping RowFetchTest

7852042

Date type work fixed this test failure Signed-off-by: Jesse Whitehouse <[email protected]>

Stop skipping FutureTableDDLTest

4000895

Add meaningful skip markers for table comment reflection and indexes Signed-off-by: Jesse Whitehouse <[email protected]>

Stop skipping Identity column tests

db6f52b

This closes #175 Signed-off-by: Jesse Whitehouse <[email protected]>

Stop skipping HasTableTest

fd8ad0a

Adding the @reflection.cache decorator to has_table is necessary to pass test_has_table_cache Caching calls to has_table improves the efficiency of the connector Signed-off-by: Jesse Whitehouse <[email protected]>

Permanently skip LongNameBlowoutTest

ed13236

Databricks constraint names are limited to 255 characters Signed-off-by: Jesse Whitehouse <[email protected]>

Stop skipping ExceptionTest

21fdf20

Black test_suite.py Signed-off-by: Jesse Whitehouse <[email protected]>

Permanently skip LastrowidTest

384e64d

Signed-off-by: Jesse Whitehouse <[email protected]>

Implement PRIMARY KEY and FOREIGN KEY reflection and enable tests

5042988

Signed-off-by: Jesse Whitehouse <[email protected]>

Skip all IdentityColumnTest tests

80f071c

Turns out that none of these can pass for the same reason that the first two seemed un-runnable in db6f52b Signed-off-by: Jesse Whitehouse <[email protected]>

susodapop requested review from arikfr, yunbodeng-db and andrefurlan-db as code owners October 12, 2023 04:23

susodapop requested a review from benc-db October 12, 2023 04:33

benc-db reviewed Oct 12, 2023

View reviewed changes

benc-db approved these changes Oct 12, 2023

View reviewed changes

susodapop merged commit ef72d74 into main Oct 13, 2023
7 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQLAlchemy 2: Stop skipping some non-type tests #247

SQLAlchemy 2: Stop skipping some non-type tests #247

susodapop commented Oct 12, 2023 •

edited

Loading

benc-db Oct 12, 2023

susodapop Oct 13, 2023

susodapop commented Oct 13, 2023

SQLAlchemy 2: Stop skipping some non-type tests #247

SQLAlchemy 2: Stop skipping some non-type tests #247

Conversation

susodapop commented Oct 12, 2023 • edited Loading

Background

Description

Stop inferring TINYINT

Approach

benc-db Oct 12, 2023

Choose a reason for hiding this comment

susodapop Oct 13, 2023

Choose a reason for hiding this comment

susodapop commented Oct 13, 2023

susodapop commented Oct 12, 2023 •

edited

Loading