GitHub - gardnmi/auto_zorder: Take the guesswork out of ZORDER

Auto ZORDER

Take the guesswork out of ZORDER

About The Project

The project aims to remove the guesswork of selecting columns to be used in the ZORDER statement. It achieves this by analyzing the logged execution plan for each cluster provided and returns the top n columns that were used in filter/where clauses.

(back to top)

Built With

(back to top)

Prerequisites

Cluster log delivery

You must setup a default destination for the cluster log delivery. For example, dbfs:/cluster-log-delivery/0630-191345-leap375. See the below link for more information on how to setup a cluster log deliver on databricks.
- https://docs.databricks.com/clusters/configure.html#cluster-log-delivery

(back to top)

Installation

pip install in your Databricks Notebook

%pip install auto_zorder

(back to top)

Example Usage

Note: If the cluster log delivery has not been active for very long then you may not see any results.

Basic Usage

from auto_zorder import auto_zorder

optimize_cmd = auto_zorder(
                    cluster_ids=['cluster_id_1', 'cluster_id_2'],
                    optimize_table='my_db.my_table'
                    )

print(optimize_cmd)
>>> 'OPTIMIZE my_db.my_table ZORDER BY (col1, col2, col3, col4, col5)'

# To run the OPTIMIZE Command
spark.sql(optimize_cmd)

Limit the Number of ZORDER columns

from auto_zorder import auto_zorder

optimize_cmd = auto_zorder(
                    cluster_ids=['cluster_id_1', 'cluster_id_2'],
                    optimize_table='my_db.my_table',
                    number_of_cols=2
                    )

print(optimize_cmd)
>>> 'OPTIMIZE my_db.my_table ZORDER BY (col1, col2)'

Save auto zorder analysis

from auto_zorder import auto_zorder

optimize_cmd = auto_zorder(
                    cluster_ids=['cluster_id_1'],
                    optimize_table='my_db.my_table',
                    save_analysis='my_db.my_analysis'
                    )

Run auto zorder using analysis instead of cluster logs

from auto_zorder import auto_zorder

optimize_cmd = auto_zorder(
                    use_analysis='my_db.my_analysis',
                    optimize_table='my_db.my_table'
                    )

Include additional columns and location in ZORDER

from auto_zorder import auto_zorder

optimize_cmd = auto_zorder(
                    cluster_ids=['cluster_id_1', 'cluster_id_2'],
                    optimize_table='my_db.my_table',
                    use_add_cols=[('add_col1', 0), ('add_col2', 4)]
                    )

print(optimize_cmd)
>>> 'OPTIMIZE my_db.my_table ZORDER BY (add_col1, auto_col1, auto_col2, auto_col3, add_col2, auto_col4, auto_col5)'

Exclude columns in ZORDER

from auto_zorder import auto_zorder

optimize_cmd = auto_zorder(
                    cluster_ids=['cluster_id_1', 'cluster_id_2'],
                    optimize_table='my_db.my_table',
                    exclude_cols=['col1']
                    )

print(optimize_cmd)
>>> 'OPTIMIZE my_db.my_table ZORDER BY (col2, col3, col4, col5, col6)'

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
auto_zorder.egg-info		auto_zorder.egg-info
auto_zorder		auto_zorder
build/lib/auto_zorder		build/lib/auto_zorder
dist		dist
LICENSE		LICENSE
README.md		README.md
SIMPLEREADME.md		SIMPLEREADME.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto ZORDER

About The Project

Built With

Prerequisites

Installation

Example Usage

Basic Usage

Limit the Number of ZORDER columns

Save auto zorder analysis

Run auto zorder using analysis instead of cluster logs

Include additional columns and location in ZORDER

Exclude columns in ZORDER

License

About

Releases

Packages

Languages

License

gardnmi/auto_zorder

Folders and files

Latest commit

History

Repository files navigation

Auto ZORDER

About The Project

Built With

Prerequisites

Installation

Example Usage

Basic Usage

Limit the Number of ZORDER columns

Save auto zorder analysis

Run auto zorder using analysis instead of cluster logs

Include additional columns and location in ZORDER

Exclude columns in ZORDER

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages