Skip to content

Commit

Permalink
added aws redshift syntax support
Browse files Browse the repository at this point in the history
  • Loading branch information
xnuinside committed Aug 2, 2021
1 parent bd74a87 commit abbdc8a
Show file tree
Hide file tree
Showing 11 changed files with 843 additions and 57 deletions.
15 changes: 15 additions & 0 deletions CHANGELOG.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
**v0.18.0**
**Features**
1. Added base support fot AWS Redshift SQL dialect.
Added support for ENCODE property in column.
Added new --output-mode='redshift' that add to column 'encrypt' property by default.
Also add table properties: distkeys, sortkey, diststyle, encode (table level encode), temp.

Supported Redshift statements: SORTKEY, DISTSTYLE, DISTKEY, ENCODE

CREATE TEMP / TEMPORARY TABLE

syntax like with LIKE statement:

create temp table tempevent(like event);

**v0.17.0**
1. All dependencies were updated for the latest version.
2. Added base support for CREATE [BIGFILE | SMALLFILE] [TEMPORARY] TABLESPACE
Expand Down
32 changes: 29 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ Build with ply (lex & yacc in python). A lot of samples in 'tests/.

### Is it Stable?

Yes, library already has about 3500+ usage per day - https://pypistats.org/packages/simple-ddl-parser.
Yes, library already has about 5000+ usage per day - https://pypistats.org/packages/simple-ddl-parser.

As maintainer I guarantee that any backward incompatible changes will not be done in patch or minor version. Only additionals & new features.

However, in process of adding support for new statements & features I see that output can be structured more optimal way and I hope to release version `1.0.*` with more struct output result. But, it will not be soon, first of all, I want to add support for so much statements as I can. So I don't think make sense to expect version 1.0.* before, for example, version `0.26.0` :)

### How does it work?

Parser tested on different DDLs for PostgreSQL & Hive. But idea to support as much as possible DDL dialects (Vertica, Oracle, Hive, MsSQL, etc.). You can check dialects sections after `Supported Statements` section to get more information that statements from dialects already supported by parser.
Parser tested on different DDLs for PostgreSQL & Hive. But idea to support as much as possible DDL dialects (AWS Redshift, Oracle, Hive, MsSQL, etc.). You can check dialects sections after `Supported Statements` section to get more information that statements from dialects already supported by parser.
**If you need some statement, that not supported by parser yet**: please provide DDL example & information about that is it SQL dialect or DB.

Types that are used in your DB does not matter, so parser must also work successfuly to any DDL for SQL DB. Parser is NOT case sensitive, it did not expect that all queries will be in upper case or lower case. So you can write statements like this:
Expand Down Expand Up @@ -94,7 +94,7 @@ And you will get output with additional keys 'stored_as', 'location', 'external'

If you run parser with command line add flag '-o=hql' or '--output-mode=hql' to get the same result.

Possible output_modes: ["mssql", "mysql", "oracle", "hql", "sql"]
Possible output_modes: ["mssql", "mysql", "oracle", "hql", "sql", "redshift"]

### From python code

Expand Down Expand Up @@ -318,6 +318,17 @@ You also can provide a path where you want to have a dumps with schema with argu
- ENCRYPT column property [+ NO SALT, SALT, USING]
- STORAGE column property


### AWS Redshift Dialect statements

- ENCODE column property
- SORTKEY, DISTSTYLE, DISTKEY, ENCODE table properties
- CREATE TEMP / TEMPORARY TABLE

- syntax like with LIKE statement:

`create temp table tempevent(like event);`

### TODO in next Releases (if you don't see feature that you need - open the issue)

1. Add more support for CREATE type IS TABLE (example: CREATE OR REPLACE TYPE budget_tbl_typ IS TABLE OF NUMBER(8,2);
Expand All @@ -339,6 +350,21 @@ For one of the work projects I needed to convert SQL ddl to Python ORM models in
So I remembered about Parser in Fakeme and just extracted it & improved.

## Changelog
**v0.18.0**
#### Features
1. Added base support fot AWS Redshift SQL dialect.
Added support for ENCODE property in column.
Added new --output-mode='redshift' that add to column 'encrypt' property by default.
Also add table properties: distkeys, sortkey, diststyle, encode (table level encode), temp.

Supported Redshift statements: SORTKEY, DISTSTYLE, DISTKEY, ENCODE

CREATE TEMP / TEMPORARY TABLE

syntax like with LIKE statement:

create temp table tempevent(like event);

**v0.17.0**
1. All dependencies were updated for the latest version.
2. Added base support for CREATE [BIGFILE | SMALLFILE] [TEMPORARY] TABLESPACE
Expand Down
37 changes: 34 additions & 3 deletions docs/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Build with ply (lex & yacc in python). A lot of samples in 'tests/.
Is it Stable?
^^^^^^^^^^^^^

Yes, library already has about 3500+ usage per day - https://pypistats.org/packages/simple-ddl-parser.
Yes, library already has about 5000+ usage per day - https://pypistats.org/packages/simple-ddl-parser.

As maintainer I guarantee that any backward incompatible changes will not be done in patch or minor version. Only additionals & new features.

Expand All @@ -34,7 +34,7 @@ However, in process of adding support for new statements & features I see that o
How does it work?
^^^^^^^^^^^^^^^^^

Parser tested on different DDLs for PostgreSQL & Hive. But idea to support as much as possible DDL dialects (Vertica, Oracle, Hive, MsSQL, etc.). You can check dialects sections after ``Supported Statements`` section to get more information that statements from dialects already supported by parser.
Parser tested on different DDLs for PostgreSQL & Hive. But idea to support as much as possible DDL dialects (AWS Redshift, Oracle, Hive, MsSQL, etc.). You can check dialects sections after ``Supported Statements`` section to get more information that statements from dialects already supported by parser.
**If you need some statement, that not supported by parser yet**\ : please provide DDL example & information about that is it SQL dialect or DB.

Types that are used in your DB does not matter, so parser must also work successfuly to any DDL for SQL DB. Parser is NOT case sensitive, it did not expect that all queries will be in upper case or lower case. So you can write statements like this:
Expand Down Expand Up @@ -114,7 +114,7 @@ And you will get output with additional keys 'stored_as', 'location', 'external'
If you run parser with command line add flag '-o=hql' or '--output-mode=hql' to get the same result.

Possible output_modes: ["mssql", "mysql", "oracle", "hql", "sql"]
Possible output_modes: ["mssql", "mysql", "oracle", "hql", "sql", "redshift"]

From python code
^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -354,6 +354,20 @@ Oracle
* ENCRYPT column property [+ NO SALT, SALT, USING]
* STORAGE column property

AWS Redshift Dialect statements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


* ENCODE column property
* SORTKEY, DISTSTYLE, DISTKEY, ENCODE table properties
*
CREATE TEMP / TEMPORARY TABLE

*
syntax like with LIKE statement:

``create temp table tempevent(like event);``

TODO in next Releases (if you don't see feature that you need - open the issue)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -382,6 +396,23 @@ So I remembered about Parser in Fakeme and just extracted it & improved.
Changelog
---------

**v0.18.0**
**Features**


#. Added base support fot AWS Redshift SQL dialect.
Added support for ENCODE property in column.
Added new --output-mode='redshift' that add to column 'encrypt' property by default.
Also add table properties: distkeys, sortkey, diststyle, encode (table level encode), temp.

Supported Redshift statements: SORTKEY, DISTSTYLE, DISTKEY, ENCODE

CREATE TEMP / TEMPORARY TABLE

syntax like with LIKE statement:

create temp table tempevent(like event);

**v0.17.0**


Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "simple-ddl-parser"
version = "0.17.0"
version = "0.18.0"
description = "Simple DDL Parser to parse SQL & dialects like HQL, TSQL, Oracle, etc ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.; sequences, alters, custom types & other entities from ddl."
authors = ["Iuliia Volkova <[email protected]>"]
license = "MIT"
Expand Down
4 changes: 2 additions & 2 deletions simple_ddl_parser/ddl_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@
from simple_ddl_parser import tokens as tok
from simple_ddl_parser.dialects.hql import HQL
from simple_ddl_parser.dialects.oracle import Oracle
from simple_ddl_parser.dialects.redshift import Redshift
from simple_ddl_parser.dialects.sql import BaseSQL
from simple_ddl_parser.parser import Parser


class DDLParser(Parser, BaseSQL, HQL, Oracle):
class DDLParser(Parser, BaseSQL, HQL, Oracle, Redshift):
"""
lex and yacc parser for parse ddl into BQ schemas
"""
Expand Down Expand Up @@ -96,7 +97,6 @@ def set_last_token(self, t):
self.lexer.is_table = False
elif t.type == "TABLE" or t.type == "INDEX":
self.lexer.is_table = True
print(t.value, t.type)
return t

def t_newline(self, t):
Expand Down
25 changes: 25 additions & 0 deletions simple_ddl_parser/dialects/redshift.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
class Redshift:
def p_expression_distkey(self, p):
"""expr : expr ID LP ID RP"""
p_list = list(p)
p[1].update({"distkey": p_list[-2]})
p[0] = p[1]

def p_encode(self, p):
"""encode : ENCODE ID"""
p_list = list(p)
p[0] = {"encode": p_list[-1]}

def p_expression_diststyle(self, p):
"""expr : expr ID ID
| expr ID KEY
"""
p_list = list(p)
p[1].update({p_list[-2]: p_list[-1]})
p[0] = p[1]

def p_expression_sortkey(self, p):
"""expr : expr ID ID LP pid RP"""
p_list = list(p)
p[1].update({"sortkey": {"type": p_list[2], "keys": p_list[-2]}})
p[0] = p[1]
Loading

0 comments on commit abbdc8a

Please sign in to comment.