PySQL Connector split into connector and sqlalchemy (#444)

* Modified the gitignore file to not have .idea file * [PECO-1803] Splitting the PySql connector into the core and the non core part (#417) * Implemented ColumnQueue to test the fetchall without pyarrow Removed token removed token * order of fields in row corrected * Changed the folder structure and tested the basic setup to work * Refractored the code to make connector to work * Basic Setup of connector, core and sqlalchemy is working * Basic integration of core, connect and sqlalchemy is working * Setup working dynamic change from ColumnQueue to ArrowQueue * Refractored the test code and moved to respective folders * Added the unit test for column_queue Fixed __version__ Fix * venv_main added to git ignore * Added code for merging columnar table * Merging code for columnar * Fixed the retry_close sesssion test issue with logging * Fixed the databricks_sqlalchemy tests and introduced pytest.ini for the sqla_testing * Added pyarrow_test mark on pytest * Fixed databricks.sqlalchemy to databricks_sqlalchemy imports * Added poetry.lock * Added dist folder * Changed the pyproject.toml * Minor Fix * Added the pyarrow skip tag on unit tests and tested their working * Fixed the Decimal and timestamp conversion issue in non arrow pipeline * Removed not required files and reformatted * Fixed test_retry error * Changed the folder structure to src / databricks * Removed the columnar non arrow flow to another PR * Moved the README to the root * removed columnQueue instance * Revmoved databricks_sqlalchemy dependency in core * Changed the pysql_supports_arrow predicate, introduced changes in the pyproject.toml * Ran the black formatter with the original version * Extra .py removed from all the __init__.py files names * Undo formatting check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * Check * BIG UPDATE * Refeactor code * Refractor * Fixed versioning * Minor refractoring * Minor refractoring * Changed the folder structure such that sqlalchemy has not reference here * Fixed README.md and CONTRIBUTING.md * Added manual publish * On push trigger added * Manually setting the publish step * Changed versioning in pyproject.toml * Bumped up the version to 4.0.0.b3 and also changed the structure to have pyarrow as optional * Removed the sqlalchemy tests from integration.yml file * [PECO-1803] Print warning message if pyarrow is not installed (#468) Print warning message if pyarrow is not installed Signed-off-by: Jacky Hu <[email protected]> * [PECO-1803] Remove sqlalchemy and update README.md (#469) Remove sqlalchemy and update README.md Signed-off-by: Jacky Hu <[email protected]> * Removed all sqlalchemy related stuff * generated the lock file * Fixed failing tests * removed poetry.lock * Updated the lock file * Fixed poetry numpy 2.2.2 issue * Workflow fixes --------- Signed-off-by: Jacky Hu <[email protected]> Co-authored-by: Jacky Hu <[email protected]>
databricks · Dec 27, 2024 · 01e998c · 01e998c
1 parent f9d6ef1
commit 01e998c
Show file tree

Hide file tree

Showing 41 changed files with 467 additions and 4,911 deletions.
diff --git a/.github/workflows/code-quality-checks.yml b/.github/workflows/code-quality-checks.yml
@@ -58,6 +58,57 @@ jobs:
       #----------------------------------------------
       - name: Run tests
         run: poetry run python -m pytest tests/unit
+  run-unit-tests-with-arrow:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: [ 3.8, 3.9, "3.10", "3.11" ]
+    steps:
+      #----------------------------------------------
+      #       check-out repo and set-up python
+      #----------------------------------------------
+      -   name: Check out repository
+          uses: actions/checkout@v2
+      -   name: Set up python ${{ matrix.python-version }}
+          id: setup-python
+          uses: actions/setup-python@v2
+          with:
+            python-version: ${{ matrix.python-version }}
+      #----------------------------------------------
+      #  -----  install & configure poetry  -----
+      #----------------------------------------------
+      -   name: Install Poetry
+          uses: snok/install-poetry@v1
+          with:
+            virtualenvs-create: true
+            virtualenvs-in-project: true
+            installer-parallel: true
+
+      #----------------------------------------------
+      #       load cached venv if cache exists
+      #----------------------------------------------
+      -   name: Load cached venv
+          id: cached-poetry-dependencies
+          uses: actions/cache@v2
+          with:
+            path: .venv-pyarrow
+            key: venv-pyarrow-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ github.event.repository.name }}-${{ hashFiles('**/poetry.lock') }}
+      #----------------------------------------------
+      # install dependencies if cache does not exist
+      #----------------------------------------------
+      -   name: Install dependencies
+          if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
+          run: poetry install --no-interaction --no-root
+      #----------------------------------------------
+      # install your root project, if required
+      #----------------------------------------------
+      -   name: Install library
+          run: poetry install --no-interaction --all-extras
+      #----------------------------------------------
+      #              run test suite
+      #----------------------------------------------
+      -   name: Run tests
+          run: poetry run python -m pytest tests/unit
   check-linting:
     runs-on: ubuntu-latest
     strategy:

diff --git a/.github/workflows/integration.yml b/.github/workflows/integration.yml
@@ -55,5 +55,3 @@ jobs:
       #----------------------------------------------
       - name: Run e2e tests
         run: poetry run python -m pytest tests/e2e
-      - name: Run SQL Alchemy tests
-        run: poetry run python -m pytest src/databricks/sqlalchemy/test_local
diff --git a/.github/workflows/publish-manual.yml b/.github/workflows/publish-manual.yml
@@ -0,0 +1,78 @@
+name: Publish to PyPI Manual [Production]
+
+# Allow manual triggering of the workflow
+on:
+  workflow_dispatch: {}
+
+jobs:
+  publish:
+    name: Publish
+    runs-on: ubuntu-latest
+
+    steps:
+      #----------------------------------------------
+      # Step 1: Check out the repository code
+      #----------------------------------------------
+      - name: Check out repository
+        uses: actions/checkout@v2  # Check out the repository to access the code
+
+      #----------------------------------------------
+      # Step 2: Set up Python environment
+      #----------------------------------------------
+      - name: Set up python
+        id: setup-python
+        uses: actions/setup-python@v2
+        with:
+          python-version: 3.9  # Specify the Python version to be used
+
+      #----------------------------------------------
+      # Step 3: Install and configure Poetry
+      #----------------------------------------------
+      - name: Install Poetry
+        uses: snok/install-poetry@v1  # Install Poetry, the Python package manager
+        with:
+          virtualenvs-create: true
+          virtualenvs-in-project: true
+          installer-parallel: true
+
+#      #----------------------------------------------
+#      # Step 4: Load cached virtual environment (if available)
+#      #----------------------------------------------
+#      - name: Load cached venv
+#        id: cached-poetry-dependencies
+#        uses: actions/cache@v2
+#        with:
+#          path: .venv  # Path to the virtual environment
+#          key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ github.event.repository.name }}-${{ hashFiles('**/poetry.lock') }}
+#          # Cache key is generated based on OS, Python version, repo name, and the `poetry.lock` file hash
+
+#      #----------------------------------------------
+#      # Step 5: Install dependencies if the cache is not found
+#      #----------------------------------------------
+#      - name: Install dependencies
+#        if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'  # Only run if the cache was not hit
+#        run: poetry install --no-interaction --no-root  # Install dependencies without interaction
+
+#      #----------------------------------------------
+#      # Step 6: Update the version to the manually provided version
+#      #----------------------------------------------
+#      - name: Update pyproject.toml with the specified version
+#        run: poetry version ${{ github.event.inputs.version }}  # Use the version provided by the user input
+
+      #----------------------------------------------
+      # Step 7: Build and publish the first package to PyPI
+      #----------------------------------------------
+      - name: Build and publish databricks sql connector to PyPI
+        working-directory: ./databricks_sql_connector
+        run: |
+          poetry build
+          poetry publish -u __token__ -p ${{ secrets.PROD_PYPI_TOKEN }}  # Publish with PyPI token
+      #----------------------------------------------
+      # Step 7: Build and publish the second package to PyPI
+      #----------------------------------------------
+
+      - name: Build and publish databricks sql connector core to PyPI
+        working-directory: ./databricks_sql_connector_core
+        run: |
+          poetry build
+          poetry publish -u __token__ -p ${{ secrets.PROD_PYPI_TOKEN }}  # Publish with PyPI token
diff --git a/.gitignore b/.gitignore
@@ -195,7 +195,7 @@ cython_debug/
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
+.idea/
 
 # End of https://www.toptal.com/developers/gitignore/api/python,macos
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,10 @@
 # Release History
 
+# 4.0.0 (TBD)
+
+- Split the connector into two separate packages: `databricks-sql-connector` and `databricks-sqlalchemy`. The `databricks-sql-connector` package contains the core functionality of the connector, while the `databricks-sqlalchemy` package contains the SQLAlchemy dialect for the connector. 
+- Pyarrow dependency is now optional in `databricks-sql-connector`. Users needing arrow are supposed to explicitly install pyarrow
+
 # 3.7.0 (2024-12-23)
 
 - Fix: Incorrect number of rows fetched in inline results when fetching results with FETCH_NEXT orientation (databricks/databricks-sql-python#479 by @jprakash-db)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -144,9 +144,6 @@ The `PySQLStagingIngestionTestSuite` namespace requires a cluster running DBR ve
 
 The suites marked `[not documented]` require additional configuration which will be documented at a later time.
 
-#### SQLAlchemy dialect tests
-
-See README.tests.md for details.
 
 ### Code formatting
 

diff --git a/README.md b/README.md
@@ -3,9 +3,9 @@
 [![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)
 [![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)
 
-The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a [SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like `pandas` and `alembic` which use SQLAlchemy to execute DDL. Use `pip install databricks-sql-connector[sqlalchemy]` to install with SQLAlchemy's dependencies. `pip install databricks-sql-connector[alembic]` will install alembic's dependencies.
+The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the [Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/).
 
-This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time.
+This connector uses Arrow as the data-exchange format, and supports APIs (e.g. `fetchmany_arrow`) to directly fetch Arrow tables. Arrow tables are wrapped in the `ArrowQueue` class to provide a natural API to get several rows at a time. [PyArrow](https://arrow.apache.org/docs/python/index.html) is required to enable this and use these APIs, you can install it via  `pip install pyarrow` or `pip install databricks-sql-connector[pyarrow]`.
 
 You are welcome to file an issue here for general use cases. You can also contact Databricks Support [here](help.databricks.com).
 
@@ -22,7 +22,12 @@ For the latest documentation, see
 
 ## Quickstart
 
-Install the library with `pip install databricks-sql-connector`
+### Installing the core library
+Install using `pip install databricks-sql-connector`
+
+### Installing the core library with PyArrow
+Install using `pip install databricks-sql-connector[pyarrow]`
+
 
 ```bash
 export DATABRICKS_HOST=********.databricks.com
@@ -60,6 +65,18 @@ or to a Databricks Runtime interactive cluster (e.g. /sql/protocolv1/o/123456789
 > to authenticate the target Databricks user account and needs to open the browser for authentication. So it 
 > can only run on the user's machine.
 
+## SQLAlchemy
+Starting from `databricks-sql-connector` version 4.0.0 SQLAlchemy support has been extracted to a new library `databricks-sqlalchemy`.
+
+- Github repository [databricks-sqlalchemy github](https://github.com/databricks/databricks-sqlalchemy)
+- PyPI [databricks-sqlalchemy pypi](https://pypi.org/project/databricks-sqlalchemy/)
+
+### Quick SQLAlchemy guide
+Users can now choose between using the SQLAlchemy v1 or SQLAlchemy v2 dialects with the connector core
+
+- Install the latest SQLAlchemy v1 using `pip install databricks-sqlalchemy~=1.0`
+- Install SQLAlchemy v2 using `pip install databricks-sqlalchemy`
+
 
 ## Contributing
 

diff --git a/examples/sqlalchemy.py b/examples/sqlalchemy.py