Skip to content

Commit

Permalink
PySQL Connector split into connector and sqlalchemy (#444)
Browse files Browse the repository at this point in the history
* Modified the gitignore file to not have .idea file

* [PECO-1803] Splitting the PySql connector into the core and the non core part (#417)

* Implemented ColumnQueue to test the fetchall without pyarrow

Removed token

removed token

* order of fields in row corrected

* Changed the folder structure and tested the basic setup to work

* Refractored the code to make connector to work

* Basic Setup of connector, core and sqlalchemy is working

* Basic integration of core, connect and sqlalchemy is working

* Setup working dynamic change from ColumnQueue to ArrowQueue

* Refractored the test code and moved to respective folders

* Added the unit test for column_queue

Fixed __version__

Fix

* venv_main added to git ignore

* Added code for merging columnar table

* Merging code for columnar

* Fixed the retry_close sesssion test issue with logging

* Fixed the databricks_sqlalchemy tests and introduced pytest.ini for the sqla_testing

* Added pyarrow_test mark on pytest

* Fixed databricks.sqlalchemy to databricks_sqlalchemy imports

* Added poetry.lock

* Added dist folder

* Changed the pyproject.toml

* Minor Fix

* Added the pyarrow skip tag on unit tests and tested their working

* Fixed the Decimal and timestamp conversion issue in non arrow pipeline

* Removed not required files and reformatted

* Fixed test_retry error

* Changed the folder structure to src / databricks

* Removed the columnar non arrow flow to another PR

* Moved the README to the root

* removed columnQueue instance

* Revmoved databricks_sqlalchemy dependency in core

* Changed the pysql_supports_arrow predicate, introduced changes in the pyproject.toml

* Ran the black formatter with the original version

* Extra .py removed from all the __init__.py files names

* Undo formatting check

* Check

* Check

* Check

* Check

* Check

* Check

* Check

* Check

* Check

* Check

* Check

* Check

* Check

* Check

* BIG UPDATE

* Refeactor code

* Refractor

* Fixed versioning

* Minor refractoring

* Minor refractoring

* Changed the folder structure such that sqlalchemy has not reference here

* Fixed README.md and CONTRIBUTING.md

* Added manual publish

* On push trigger added

* Manually setting the publish step

* Changed versioning in pyproject.toml

* Bumped up the version to 4.0.0.b3 and also changed the structure to have pyarrow as optional

* Removed the sqlalchemy tests from integration.yml file

* [PECO-1803] Print warning message if pyarrow is not installed (#468)

Print warning message if pyarrow is not installed

Signed-off-by: Jacky Hu <[email protected]>

* [PECO-1803] Remove sqlalchemy and update README.md (#469)

Remove sqlalchemy and update README.md

Signed-off-by: Jacky Hu <[email protected]>

* Removed all sqlalchemy related stuff

* generated the lock file

* Fixed failing tests

* removed poetry.lock

* Updated the lock file

* Fixed poetry numpy 2.2.2 issue

* Workflow fixes

---------

Signed-off-by: Jacky Hu <[email protected]>
Co-authored-by: Jacky Hu <[email protected]>
  • Loading branch information
jprakash-db and jackyhu-db authored Dec 27, 2024
1 parent f9d6ef1 commit 01e998c
Show file tree
Hide file tree
Showing 41 changed files with 467 additions and 4,911 deletions.
51 changes: 51 additions & 0 deletions .github/workflows/code-quality-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,57 @@ jobs:
#----------------------------------------------
- name: Run tests
run: poetry run python -m pytest tests/unit
run-unit-tests-with-arrow:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ 3.8, 3.9, "3.10", "3.11" ]
steps:
#----------------------------------------------
# check-out repo and set-up python
#----------------------------------------------
- name: Check out repository
uses: actions/checkout@v2
- name: Set up python ${{ matrix.python-version }}
id: setup-python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
#----------------------------------------------
# ----- install & configure poetry -----
#----------------------------------------------
- name: Install Poetry
uses: snok/install-poetry@v1
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true

#----------------------------------------------
# load cached venv if cache exists
#----------------------------------------------
- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v2
with:
path: .venv-pyarrow
key: venv-pyarrow-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ github.event.repository.name }}-${{ hashFiles('**/poetry.lock') }}
#----------------------------------------------
# install dependencies if cache does not exist
#----------------------------------------------
- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --no-interaction --no-root
#----------------------------------------------
# install your root project, if required
#----------------------------------------------
- name: Install library
run: poetry install --no-interaction --all-extras
#----------------------------------------------
# run test suite
#----------------------------------------------
- name: Run tests
run: poetry run python -m pytest tests/unit
check-linting:
runs-on: ubuntu-latest
strategy:
Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,3 @@ jobs:
#----------------------------------------------
- name: Run e2e tests
run: poetry run python -m pytest tests/e2e
- name: Run SQL Alchemy tests
run: poetry run python -m pytest src/databricks/sqlalchemy/test_local
78 changes: 78 additions & 0 deletions .github/workflows/publish-manual.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
name: Publish to PyPI Manual [Production]

# Allow manual triggering of the workflow
on:
workflow_dispatch: {}

jobs:
publish:
name: Publish
runs-on: ubuntu-latest

steps:
#----------------------------------------------
# Step 1: Check out the repository code
#----------------------------------------------
- name: Check out repository
uses: actions/checkout@v2 # Check out the repository to access the code

#----------------------------------------------
# Step 2: Set up Python environment
#----------------------------------------------
- name: Set up python
id: setup-python
uses: actions/setup-python@v2
with:
python-version: 3.9 # Specify the Python version to be used

#----------------------------------------------
# Step 3: Install and configure Poetry
#----------------------------------------------
- name: Install Poetry
uses: snok/install-poetry@v1 # Install Poetry, the Python package manager
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true

# #----------------------------------------------
# # Step 4: Load cached virtual environment (if available)
# #----------------------------------------------
# - name: Load cached venv
# id: cached-poetry-dependencies
# uses: actions/cache@v2
# with:
# path: .venv # Path to the virtual environment
# key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ github.event.repository.name }}-${{ hashFiles('**/poetry.lock') }}
# # Cache key is generated based on OS, Python version, repo name, and the `poetry.lock` file hash

# #----------------------------------------------
# # Step 5: Install dependencies if the cache is not found
# #----------------------------------------------
# - name: Install dependencies
# if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true' # Only run if the cache was not hit
# run: poetry install --no-interaction --no-root # Install dependencies without interaction

# #----------------------------------------------
# # Step 6: Update the version to the manually provided version
# #----------------------------------------------
# - name: Update pyproject.toml with the specified version
# run: poetry version ${{ github.event.inputs.version }} # Use the version provided by the user input

#----------------------------------------------
# Step 7: Build and publish the first package to PyPI
#----------------------------------------------
- name: Build and publish databricks sql connector to PyPI
working-directory: ./databricks_sql_connector
run: |
poetry build
poetry publish -u __token__ -p ${{ secrets.PROD_PYPI_TOKEN }} # Publish with PyPI token
#----------------------------------------------
# Step 7: Build and publish the second package to PyPI
#----------------------------------------------

- name: Build and publish databricks sql connector core to PyPI
working-directory: ./databricks_sql_connector_core
run: |
poetry build
poetry publish -u __token__ -p ${{ secrets.PROD_PYPI_TOKEN }} # Publish with PyPI token
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/

# End of https://www.toptal.com/developers/gitignore/api/python,macos

Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Release History

# 4.0.0 (TBD)

- Split the connector into two separate packages: `databricks-sql-connector` and `databricks-sqlalchemy`. The `databricks-sql-connector` package contains the core functionality of the connector, while the `databricks-sqlalchemy` package contains the SQLAlchemy dialect for the connector.
- Pyarrow dependency is now optional in `databricks-sql-connector`. Users needing arrow are supposed to explicitly install pyarrow

# 3.7.0 (2024-12-23)

- Fix: Incorrect number of rows fetched in inline results when fetching results with FETCH_NEXT orientation (databricks/databricks-sql-python#479 by @jprakash-db)
Expand Down
3 changes: 0 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,6 @@ The `PySQLStagingIngestionTestSuite` namespace requires a cluster running DBR ve

The suites marked `[not documented]` require additional configuration which will be documented at a later time.

#### SQLAlchemy dialect tests

See README.tests.md for details.

### Code formatting

Expand Down
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
[![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)
[![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)

The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL. Use `pip install databricks-sql-connector[sqlalchemy]` to install with SQLAlchemy's dependencies. `pip install databricks-sql-connector[alembic]` will install alembic's dependencies.
The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/).

This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time.
This connector uses Arrow as the data-exchange format, and supports APIs (e.g. `fetchmany_arrow`) to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time. [PyArrow](https://arrow.apache.org/docs/python/index.html) is required to enable this and use these APIs, you can install it via `pip install pyarrow` or `pip install databricks-sql-connector[pyarrow]`.

You are welcome to file an issue here for general use cases. You can also contact Databricks Support [here](help.databricks.com).

Expand All @@ -22,7 +22,12 @@ For the latest documentation, see

## Quickstart

Install the library with `pip install databricks-sql-connector`
### Installing the core library
Install using `pip install databricks-sql-connector`

### Installing the core library with PyArrow
Install using `pip install databricks-sql-connector[pyarrow]`


```bash
export DATABRICKS_HOST=********.databricks.com
Expand Down Expand Up @@ -60,6 +65,18 @@ or to a Databricks Runtime interactive cluster (e.g. /sql/protocolv1/o/123456789
> to authenticate the target Databricks user account and needs to open the browser for authentication. So it
> can only run on the user's machine.
## SQLAlchemy
Starting from `databricks-sql-connector` version 4.0.0 SQLAlchemy support has been extracted to a new library `databricks-sqlalchemy`.

- Github repository [databricks-sqlalchemy github](https://github.com/databricks/databricks-sqlalchemy)
- PyPI [databricks-sqlalchemy pypi](https://pypi.org/project/databricks-sqlalchemy/)

### Quick SQLAlchemy guide
Users can now choose between using the SQLAlchemy v1 or SQLAlchemy v2 dialects with the connector core

- Install the latest SQLAlchemy v1 using `pip install databricks-sqlalchemy~=1.0`
- Install SQLAlchemy v2 using `pip install databricks-sqlalchemy`


## Contributing

Expand Down
174 changes: 0 additions & 174 deletions examples/sqlalchemy.py

This file was deleted.

Loading

0 comments on commit 01e998c

Please sign in to comment.