-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PECO-1803] Splitting the PySql connector into the core and the non core part #417
[PECO-1803] Splitting the PySql connector into the core and the non core part #417
Conversation
Removed token removed token
Fixed __version__ Fix
4351410
to
c91d43d
Compare
databricks_sql_connector_core/src/databricks_sql_connector_core/sql/auth/auth.py
Outdated
Show resolved
Hide resolved
databricks_sql_connector_core/src/databricks_sql_connector_core/sqlalchemy/__init__.py
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,23 @@ | |||
[tool.poetry] | |||
name = "databricks-sql-connector" | |||
version = "3.3.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we change the package structure in this release. I would suggest to bump the major version to 4.0.0
or at least it should be greater than current release 3.4.0
, so it has to start with 3.5.0
. @kravets-levko wdyt?
README.md
Outdated
@@ -60,7 +60,7 @@ In the above example: | |||
- `server-hostname` is the Databricks instance host name. | |||
- `http-path` is the HTTP Path either to a Databricks SQL endpoint (e.g. /sql/1.0/endpoints/1234567890abcdef), | |||
or to a Databricks Runtime interactive cluster (e.g. /sql/protocolv1/o/1234567890123456/1234-123456-slid123) | |||
- `personal-access-token` is the Databricks Personal Access Token for the account that will execute commands and queries | |||
`personal-access-token` is the Databricks Personal Access Token for the account that will execute commands and queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is some accidental change - please revert it
@@ -40,6 +44,7 @@ def fetch_rows(self, cursor, row_count, fetchmany_size): | |||
+ "assuming 10K fetch size." | |||
) | |||
|
|||
@skipUnless(pysql_supports_arrow(), "Without pyarrow lz4 compression is not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stick with @pytest.mark.skipif
everywhere (please check other test files as well)
poetry.lock
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, when we have two projects - each of them needs its own lockfile. Keeping the old one at old path is useless, because Poetry won't use it.
Basically, there should be a new lockfile created for databricks_sql_connector
. For databricks_sql_connector
the old one should be reused, with unnecessary dependencies removed
4099939
into
databricks:PECO-1803/connector-split
* Modified the gitignore file to not have .idea file * [PECO-1803] Splitting the PySql connector into the core and the non core part (#417) * Implemented ColumnQueue to test the fetchall without pyarrow Removed token removed token * order of fields in row corrected * Changed the folder structure and tested the basic setup to work * Refractored the code to make connector to work * Basic Setup of connector, core and sqlalchemy is working * Basic integration of core, connect and sqlalchemy is working * Setup working dynamic change from ColumnQueue to ArrowQueue * Refractored the test code and moved to respective folders * Added the unit test for column_queue Fixed __version__ Fix * venv_main added to git ignore * Added code for merging columnar table * Merging code for columnar * Fixed the retry_close sesssion test issue with logging * Fixed the databricks_sqlalchemy tests and introduced pytest.ini for the sqla_testing * Added pyarrow_test mark on pytest * Fixed databricks.sqlalchemy to databricks_sqlalchemy imports * Added poetry.lock * Added dist folder * Changed the pyproject.toml * Minor Fix * Added the pyarrow skip tag on unit tests and tested their working * Fixed the Decimal and timestamp conversion issue in non arrow pipeline * Removed not required files and reformatted * Fixed test_retry error * Changed the folder structure to src / databricks * Removed the columnar non arrow flow to another PR * Moved the README to the root * removed columnQueue instance * Revmoved databricks_sqlalchemy dependency in core * Changed the pysql_supports_arrow predicate, introduced changes in the pyproject.toml * Ran the black formatter with the original version * Extra .py removed from all the __init__.py files names * Undo formatting check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * BIG UPDATE * Refeactor code * Refractor * Fixed versioning * Minor refractoring * Minor refractoring * Changed the folder structure such that sqlalchemy has not reference here * Fixed README.md and CONTRIBUTING.md * Added manual publish * On push trigger added * Manually setting the publish step * Changed versioning in pyproject.toml * Bumped up the version to 4.0.0.b3 and also changed the structure to have pyarrow as optional * Removed the sqlalchemy tests from integration.yml file * [PECO-1803] Print warning message if pyarrow is not installed (#468) Print warning message if pyarrow is not installed Signed-off-by: Jacky Hu <[email protected]> * [PECO-1803] Remove sqlalchemy and update README.md (#469) Remove sqlalchemy and update README.md Signed-off-by: Jacky Hu <[email protected]> * Removed all sqlalchemy related stuff * generated the lock file * Fixed failing tests * removed poetry.lock * Updated the lock file * Fixed poetry numpy 2.2.2 issue * Workflow fixes --------- Signed-off-by: Jacky Hu <[email protected]> Co-authored-by: Jacky Hu <[email protected]>
PECO-1803
Related Links
databricks_sqlalchemy split is present in this PR - databricks/databricks-sqlalchemy#1
Description
databricks-sql-python library is split so that package size can be reduced for the end user based on their requirements
Particularly pyarrow is the heavy component that is planned to be kept optional
existing library split into
Tasks Completed
How to Test
Testing pipeline remains the same as it is before the split.
pytest can be used to directly run both the integration as well as unit tests, by
pytest [directory_name or file_name]
Addition of dist folder in this repo
Github actions have been setup in the databricks_sqlalchemy repo to run tests using the databricks_sql_connector_core. For running those tests currently we need the .whl file in the dist folder and for temporary testing it has been added to the PR.
Once the library gets published to a public repository such as PyPi then databricks_sqlalchemy will automatically download it from that repo and run the tests using Github actions
Performance Comparison - Benchmarking
The pre-split and post-split preformance comparison has been made using the large and small queries to make sure their is no regression of performance
Dashboard has been created so that everytime the benchmarking is run the result are stored in the benchfood, and comparisons can be made easily