Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: escape table and field names #32

Open
tomasfarias opened this issue Aug 10, 2020 · 7 comments
Open

New feature: escape table and field names #32

tomasfarias opened this issue Aug 10, 2020 · 7 comments

Comments

@tomasfarias
Copy link

tomasfarias commented Aug 10, 2020

Currently, table names and field names can be used with the sqlsafe filter. Would it be possible to introduce a new filter to handle escaping for table names and field names?

The motivation for this comes from a current bulk load job I'm working on which consists on a bunch of queries that all look the same except for table and schema that are fed from a configuration file. I'm currently using psycopg2.sql to handle escaping, but I'd prefer to remove the SQL code from the application. This would allow me to have access to Jinja features like template inheritance or tests to extend the bulk load job capabilities.

psycopg2.sql ends up using PQescapeIdentifier from libpq to handle escaping, but we can do what the folks at diesel did and implement it ourselves (forgetting about encodings for the example).

Adding something like this in jinjasql/jinjasql/core.py:

def identifier(*values: str) -> str:
    return Markup('.'.join(f'"{escape_quotes(value)}"' for value in values))

def escape_quotes(s: str) -> str:
    return s.replace('"', '""')

...

class JinjaSql(object):
        def _prepare_environment(self):
            self.env.filters["identifier"] = identifier

Would allow for templates like:

SELECT *
FROM {{ table | identifier }}

To be rendered as:

SELECT *
FROM "my_schema"."my_table"

Assuming a call to prepare_query like: j.prepare_query(template, {'table': ('my_schema', 'my_table')})

I understand this feature would be dependent on database system: this particular example would work with PostgreSQL but not with MySQL since double quotes for system identifiers are not a thing there. So the filter would need to be decorated with contextfilter so that a db_engine parameter can be read from it and new environments can be added over time; really bad example:

@contextfilter
def identifier(context, *values: str) -> str: 
    if context.eval_ctx.db_engine == 'postgres':
        return Markup('.'.join(f'"{escape_quotes(value)}"' for value in values))
    elif context.eval_ctx.db_engine == 'mysql':
        return do_mysql_escape(values)
    else:
        raise ValueError(f'{db_engine} not supported')

With the new attribute added to the environment:

class SqlExtension(Extension):
    def__init__(self, environment):
        super().__init__(environment)
        environment.extend(
            db_engine='postgres',
        )

edit: Forgot to add I'd be more than willing to implement something like this myself if it's deemed to be a good feature.

@sripathikrishnan
Copy link
Owner

I like the idea of the | identifier filter, this is a good feature to have.

I think the db_engine is also a good parameter. We could use db_engine to also choose the right param_style, and also use it to enable / disable certain database specific features. For example, postgres allows you to bind a list / dictionary, but other databases do not allow it.

So yes, I think this could be a good feature. I would be happy to take in a pull request!

@tomasfarias
Copy link
Author

Thanks for the input! I'll try to get a PR going for the weekend.

@sureshdsk
Copy link

@sripathikrishnan @tomasfarias would love use this feature. will this be merged anytime soon?

@tomasfarias
Copy link
Author

PR has been awaiting review for a few months now. I have time to work on this and I am most willing to make changes to my branch if needed.

I'm revisiting this project after some time and can already think of a few potential changes to make maintenance and development easier, automatic code formatting via black being one of them. Unfortunately, it seems like it has become inactive somewhat: last release was in May 2020.

@ccolgrove
Copy link

I'd also make use of this feature, and happy to help with anything that would help get it merged.

@bfmcneill
Copy link

Can we template [schema].[table_name] with this library? I tried it with the current version and got a slightly unexpected result

SELECT * 
    FROM %s.%s 
    ORDER BY %s 
    OFFSET %s ROWS 
    FETCH NEXT %s ROWS ONLY

@sripathikrishnan
Copy link
Owner

Looking at this issue with a fresh pair of eyes, some additional thoughts.

  1. I would like to keep JinjaSQL agnostic of the underlying database engine. Making it aware of database engine makes maintenance more complicated.
  2. It seems most databases use double quotes to quote identifiers
  3. MySQL by default uses back-ticks to quote, but you can set a mode ANSI_QUOTES, and then MySQL can also use double quotes. Since the same database can use different quote characters, this further strengthens my argument that we should not do an if condition on the db_engine.

Based on this, my solution would be:

  1. Add a constructor parameter identifier_quote_character, that defaults to double quotes
  2. Add the identifier filter as discussed above, but quote on the basis of identifier_quote_character

I will try and get this out in the next few weeks. Thanks for all the help!

sripathikrishnan added a commit that referenced this issue Dec 29, 2021
Quite often, there is a need to create dynamic queries where the table or column name is only known
at run time. Until now, one had to resort to the potentially dangerous | sqlsafe filter and had to
ensure that the table / column name did not have any sql injection.

Most databases provide a way to quote identifiers. Most databases uses double quotes as a way to
quote table / column names. Notable exception is MySql, which by default uses backticks as the escape character

With this commit, we add a new jinja2 filter call identifier. This filter will automatically quote and escape
the table/column names that are injected at run time.

Typical usage:
template = 'SELECT {{colname|identifier}} FROM {{tablename|identifier}}'

will generate a query like
'SELECT somecol FROM myschema.sometable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants