added aws redshift syntax support

xnuinside · Aug 2, 2021 · abbdc8a · abbdc8a
1 parent bd74a87
commit abbdc8a
Show file tree

Hide file tree

Showing 11 changed files with 843 additions and 57 deletions.
diff --git a/CHANGELOG.txt b/CHANGELOG.txt
@@ -1,3 +1,18 @@
+**v0.18.0**
+**Features**
+1. Added base support fot AWS Redshift SQL dialect. 
+Added support for ENCODE property in column.
+Added new --output-mode='redshift' that add to column 'encrypt' property by default.
+Also add table properties: distkeys, sortkey, diststyle, encode (table level encode), temp.
+
+Supported Redshift statements: SORTKEY, DISTSTYLE, DISTKEY, ENCODE
+
+CREATE TEMP / TEMPORARY TABLE
+
+syntax like with LIKE statement:
+
+create temp table tempevent(like event); 
+
 **v0.17.0**
 1. All dependencies were updated for the latest version.
 2. Added base support for CREATE [BIGFILE | SMALLFILE] [TEMPORARY] TABLESPACE 

diff --git a/README.md b/README.md
@@ -6,15 +6,15 @@ Build with ply (lex & yacc in python). A lot of samples in 'tests/.
 
 ### Is it Stable?
 
-Yes, library already has about 3500+ usage per day - https://pypistats.org/packages/simple-ddl-parser.
+Yes, library already has about 5000+ usage per day - https://pypistats.org/packages/simple-ddl-parser.
 
 As maintainer I guarantee that any backward incompatible changes will not be done in patch or minor version. Only additionals & new features.
 
 However, in process of adding support for new statements & features I see that output can be structured more optimal way and I hope to release version `1.0.*` with more struct output result. But, it will not be soon, first of all, I want to add support for so much statements as I can. So I don't think make sense to expect version 1.0.* before, for example, version `0.26.0` :)
 
 ### How does it work?
 
-Parser tested on different DDLs for PostgreSQL & Hive. But idea to support as much as possible DDL dialects (Vertica, Oracle, Hive, MsSQL, etc.). You can check dialects sections after `Supported Statements` section to get more information that statements from dialects already supported by parser.
+Parser tested on different DDLs for PostgreSQL & Hive. But idea to support as much as possible DDL dialects (AWS Redshift, Oracle, Hive, MsSQL, etc.). You can check dialects sections after `Supported Statements` section to get more information that statements from dialects already supported by parser.
 **If you need some statement, that not supported by parser yet**: please provide DDL example & information about that is it SQL dialect or DB.
 
 Types that are used in your DB does not matter, so parser must also work successfuly to any DDL for SQL DB. Parser is NOT case sensitive, it did not expect that all queries will be in upper case or lower case. So you can write statements like this:
@@ -94,7 +94,7 @@ And you will get output with additional keys 'stored_as', 'location', 'external'
 
 If you run parser with command line add flag '-o=hql' or '--output-mode=hql' to get the same result.
 
-Possible output_modes: ["mssql", "mysql", "oracle", "hql", "sql"]
+Possible output_modes: ["mssql", "mysql", "oracle", "hql", "sql", "redshift"]
 
 ### From python code
 
@@ -318,6 +318,17 @@ You also can provide a path where you want to have a dumps with schema with argu
 - ENCRYPT column property [+ NO SALT, SALT, USING]
 - STORAGE column property
 
+
+### AWS Redshift Dialect statements
+
+- ENCODE column property
+- SORTKEY, DISTSTYLE, DISTKEY, ENCODE table properties
+- CREATE TEMP / TEMPORARY TABLE
+
+- syntax like with LIKE statement:
+
+ `create temp table tempevent(like event);`
+
 ### TODO in next Releases (if you don't see feature that you need - open the issue)
 
 1. Add more support for CREATE type IS TABLE (example: CREATE OR REPLACE TYPE budget_tbl_typ IS TABLE OF NUMBER(8,2);
@@ -339,6 +350,21 @@ For one of the work projects I needed to convert SQL ddl to Python ORM models in
 So I remembered about Parser in Fakeme and just extracted it & improved. 
 
 ## Changelog
+**v0.18.0**
+#### Features
+1. Added base support fot AWS Redshift SQL dialect. 
+Added support for ENCODE property in column.
+Added new --output-mode='redshift' that add to column 'encrypt' property by default.
+Also add table properties: distkeys, sortkey, diststyle, encode (table level encode), temp.
+
+Supported Redshift statements: SORTKEY, DISTSTYLE, DISTKEY, ENCODE
+
+CREATE TEMP / TEMPORARY TABLE
+
+syntax like with LIKE statement:
+
+create temp table tempevent(like event); 
+
 **v0.17.0**
 1. All dependencies were updated for the latest version.
 2. Added base support for CREATE [BIGFILE | SMALLFILE] [TEMPORARY] TABLESPACE 

diff --git a/docs/README.rst b/docs/README.rst
@@ -25,7 +25,7 @@ Build with ply (lex & yacc in python). A lot of samples in 'tests/.
 Is it Stable?
 ^^^^^^^^^^^^^
 
-Yes, library already has about 3500+ usage per day - https://pypistats.org/packages/simple-ddl-parser.
+Yes, library already has about 5000+ usage per day - https://pypistats.org/packages/simple-ddl-parser.
 
 As maintainer I guarantee that any backward incompatible changes will not be done in patch or minor version. Only additionals & new features.
 
@@ -34,7 +34,7 @@ However, in process of adding support for new statements & features I see that o
 How does it work?
 ^^^^^^^^^^^^^^^^^
 
-Parser tested on different DDLs for PostgreSQL & Hive. But idea to support as much as possible DDL dialects (Vertica, Oracle, Hive, MsSQL, etc.). You can check dialects sections after ``Supported Statements`` section to get more information that statements from dialects already supported by parser.
+Parser tested on different DDLs for PostgreSQL & Hive. But idea to support as much as possible DDL dialects (AWS Redshift, Oracle, Hive, MsSQL, etc.). You can check dialects sections after ``Supported Statements`` section to get more information that statements from dialects already supported by parser.
 **If you need some statement, that not supported by parser yet**\ : please provide DDL example & information about that is it SQL dialect or DB.
 
 Types that are used in your DB does not matter, so parser must also work successfuly to any DDL for SQL DB. Parser is NOT case sensitive, it did not expect that all queries will be in upper case or lower case. So you can write statements like this:
@@ -114,7 +114,7 @@ And you will get output with additional keys 'stored_as', 'location', 'external'
 
 If you run parser with command line add flag '-o=hql' or '--output-mode=hql' to get the same result.
 
-Possible output_modes: ["mssql", "mysql", "oracle", "hql", "sql"]
+Possible output_modes: ["mssql", "mysql", "oracle", "hql", "sql", "redshift"]
 
 From python code
 ^^^^^^^^^^^^^^^^
@@ -354,6 +354,20 @@ Oracle
 * ENCRYPT column property [+ NO SALT, SALT, USING]
 * STORAGE column property
 
+AWS Redshift Dialect statements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+* ENCODE column property
+* SORTKEY, DISTSTYLE, DISTKEY, ENCODE table properties
+* 
+  CREATE TEMP / TEMPORARY TABLE
+
+* 
+  syntax like with LIKE statement:
+
+  ``create temp table tempevent(like event);``
+
 TODO in next Releases (if you don't see feature that you need - open the issue)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -382,6 +396,23 @@ So I remembered about Parser in Fakeme and just extracted it & improved.
 Changelog
 ---------
 
+**v0.18.0**
+**Features**
+
+
+#. Added base support fot AWS Redshift SQL dialect. 
+   Added support for ENCODE property in column.
+   Added new --output-mode='redshift' that add to column 'encrypt' property by default.
+   Also add table properties: distkeys, sortkey, diststyle, encode (table level encode), temp.
+
+Supported Redshift statements: SORTKEY, DISTSTYLE, DISTKEY, ENCODE
+
+CREATE TEMP / TEMPORARY TABLE
+
+syntax like with LIKE statement:
+
+create temp table tempevent(like event); 
+
 **v0.17.0**
 
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "simple-ddl-parser"
-version = "0.17.0"
+version = "0.18.0"
 description = "Simple DDL Parser to parse SQL & dialects like HQL, TSQL, Oracle, etc ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.; sequences, alters, custom types & other entities from ddl."
 authors = ["Iuliia Volkova <[email protected]>"]
 license = "MIT"

diff --git a/simple_ddl_parser/ddl_parser.py b/simple_ddl_parser/ddl_parser.py
@@ -3,11 +3,12 @@
 from simple_ddl_parser import tokens as tok
 from simple_ddl_parser.dialects.hql import HQL
 from simple_ddl_parser.dialects.oracle import Oracle
+from simple_ddl_parser.dialects.redshift import Redshift
 from simple_ddl_parser.dialects.sql import BaseSQL
 from simple_ddl_parser.parser import Parser
 
 
-class DDLParser(Parser, BaseSQL, HQL, Oracle):
+class DDLParser(Parser, BaseSQL, HQL, Oracle, Redshift):
     """
     lex and yacc parser for parse ddl into BQ schemas
     """
@@ -96,7 +97,6 @@ def set_last_token(self, t):
             self.lexer.is_table = False
         elif t.type == "TABLE" or t.type == "INDEX":
             self.lexer.is_table = True
-        print(t.value, t.type)
         return t
 
     def t_newline(self, t):

diff --git a/simple_ddl_parser/dialects/redshift.py b/simple_ddl_parser/dialects/redshift.py
@@ -0,0 +1,25 @@
+class Redshift:
+    def p_expression_distkey(self, p):
+        """expr : expr ID LP ID RP"""
+        p_list = list(p)
+        p[1].update({"distkey": p_list[-2]})
+        p[0] = p[1]
+
+    def p_encode(self, p):
+        """encode : ENCODE ID"""
+        p_list = list(p)
+        p[0] = {"encode": p_list[-1]}
+
+    def p_expression_diststyle(self, p):
+        """expr : expr ID ID
+        | expr ID KEY
+        """
+        p_list = list(p)
+        p[1].update({p_list[-2]: p_list[-1]})
+        p[0] = p[1]
+
+    def p_expression_sortkey(self, p):
+        """expr : expr ID ID LP pid RP"""
+        p_list = list(p)
+        p[1].update({"sortkey": {"type": p_list[2], "keys": p_list[-2]}})
+        p[0] = p[1]