Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WITH clause doesn't work with SQLite #50

Open
n87 opened this issue Sep 21, 2021 · 9 comments
Open

WITH clause doesn't work with SQLite #50

n87 opened this issue Sep 21, 2021 · 9 comments

Comments

@n87
Copy link

n87 commented Sep 21, 2021

Tested on these queries in the cloud demo [1]:

with t as (select * from employees) select * from t
with t as (select 1) select 1

It doesn't output anything, but also doesn't show errors

[1] https://mybinder.org/v2/gh/jupyter-xeus/xeus-sql/stable?urlpath=lab/tree/examples/XVega%20operations.ipynb

@marimeireles
Copy link
Member

marimeireles commented Sep 22, 2021

Hey @n87, thank you for opening the issue :)
That's true. I don't understand why it's not being picked up by SOCI though. We treat this case in the following lines:

            else
            {
                if (this->sql)
                {
                    /* Shows rich output for tables */
                    if (xv_bindings::case_insentive_equals("SELECT", tokenized_input[0]) ||
                        xv_bindings::case_insentive_equals("DESC", tokenized_input[0]) ||
                        xv_bindings::case_insentive_equals("DESCRIBE", tokenized_input[0]) ||
                        xv_bindings::case_insentive_equals("SHOW", tokenized_input[0]))
                    {
                        nl::json data = process_SQL_input(code, xv_sql_df);


                        publish_execution_result(execution_counter,
                                                 std::move(data),
                                                 nl::json::object());


                    }
                    /* Execute all SQL commands that don't output tables */
                    else
                    {
                        *this->sql << code;
                    }
                }

https://github.com/jupyter-xeus/xeus-sql/blob/master/src/xeus_sql_interpreter.cpp#L295-L317

Here are some things I'm thinking could be the culprit for this issue:

  • check upstream if SOCI deals with the WITH keyword, maybe it's not supported by them
  • does this command requires some special treatment? For everything that's not acting directly in the database we have to create special cases called magics, see for example how we call LOAD:
    %LOAD sqlite3 db=chinook.db timeout=2 shared_cache=true
    I believe this might be the case for WITH and then we want to do something similar to what we're doing with LOAD :)

I'm not sure, these are just some hints. If you or anyone want to tackle this issue I'm here to help.
Unfortunately I can't dedicate so much time to this project right now. (Maybe in the future!)
But thanks again for opening this and please feel free to ask questions.

@n87
Copy link
Author

n87 commented Sep 22, 2021

@marimeireles it looks like your snippet is just enough to explain the issue. You check that first token is one of (SELECT, DESC, DESCRIBE, SHOW). But SELECT queries can also start with WITH or with VALUES: https://sqlite.org/lang_select.html

Note that WITH can also precede DELETE, INSERT etc., so I'm not sure what the fix should be.

@marimeireles
Copy link
Member

marimeireles commented Sep 23, 2021

Hum! This is where my lack of knowledge in SQL attacks again!
Sorry for this, I'm mostly a C++/Python dev and just know the very basics of SQL.

Yeah, seems a bit complicated. If I understand correctly this WITH clause is like a "variable(? did I get that right?)" that you can store the values of the query? And then you have the RECURSIVE modifier, even.

I can't tell if SOCI offers support for it, because they have no entries about WITH in their docs. And looking further, they have no WITH tests nor examples, maybe they don't support it. So first step as I said, is making sure they do. Maybe one could even open an issue upstream asking about it and how to use it.

Afterwards... I'd say the steps to fix this would be:

  • create an example that uses WITH (you've done this already)
  • check the method process_SQL_input and print on the console the output of this code :), using this and maybe the help of the SOCI people we can figure out how to treat this result. Do we have to store this var ourselves? What type does it have? If we have to create a type for it is probably very complicated to tackle, if there's a type, should be easy, we can just create another if check and add it before the other words

@n87
Copy link
Author

n87 commented Sep 23, 2021

A more googlable term for WITH clause is Common Table Expression. It is like defining variables, but more closer equivalent is let clause in functional languages, e.g. Haskell:

let x = 2 in x * x

Somewhat contrived SQL(ite) equivalent:

> with t as (select 2 as x) select x * x from t;
x * x
----------
4

@darkdreamingdan
Copy link

darkdreamingdan commented Feb 16, 2022

Just came across this issue. Wanted to use Jupyter to teach Common Table Expressions, but seems it doesnt work :(.

Is it just a matter of whitelisting the missing WITH and VALUES as the first token in the parser? That feels like it could be a straightforward fix? Even if there are edge cases where the result does not need to output, it's still better than current behaviour which does not output at all.

Please could we just add this as a quick fix? 🙏

@marimeireles
Copy link
Member

Hey @darkdreamingdan, this is not a simple issue, unfortunately.
This is a couple of my hours of work, for sure and I can't invoice this project currently, that's why I left it open for the community.
But I just thought on an interesting way of implementing it and I'll give it a try on my free time.

My explanation in the previous comment is a bit confusing but I think there are two parts of this issue:

  1. treating queries as a variable and saving them -> this is what I'm going to work on first, it seems relatively simple, from what I'm thinking...
  2. Implementing recursive behavior -> that's a bit more complicated and I still haven't thought on a way to do it

I'll try to tackle it this week. Will keep you updated of any progress.

Another thing you could try is opening an issue in SOCI and asking them about the possibility of including this feature. If they support it then I'll add it here, as it's very easy to :)

@stevemarin
Copy link

Seeing same issue, so I came here to see if this was previously reported. And you're already working it! Thank you very, very much for these wonderful tools!

:tracking:

@scheb
Copy link

scheb commented Apr 3, 2023

Sorry to bring up this topic again. I'm having pretty much the same use case as @darkdreamingdan. I'd love to use xeus-sql to teach students how to use SQL. It's pretty much perfect for my use-case, except for this issue. We'd like to teach them the benefit of structuring queries using WITH for better readability.

I'm not having much knowledge about C++, but it seems to me, these lines here determine which kind of queries are displaying a result in the notebook, based on the first keyword. Syntactically the WITH statement has to come first in a SELECT query. Adding the WITH keyword there, could potentially fix it. But that's just a guess.

if (xv_bindings::case_insentive_equals("SELECT", tokenized_input[0]) ||
xv_bindings::case_insentive_equals("DESC", tokenized_input[0]) ||
xv_bindings::case_insentive_equals("DESCRIBE", tokenized_input[0]) ||
xv_bindings::case_insentive_equals("SHOW", tokenized_input[0]))

Let me know what you think!

@camelusminimus
Copy link

If I may drop into this issue after such a long time, I am a large fan of CTE myself, and thus would be very happy if they could be taught early using jupyter...

I fear my understanding of the problem, and the way @marimeireles described it are a bit at odds. The following is my understanding:

CTE do not modify anything outside of the current query, they only provide a named binding to a (sub) query that (afaik) is limited to varying types of SELECT queries. But this binding is without consequence outside of the current query. Thus the Queries:

WITH
inner_query_a as (
   SELECT
       1 AS id,
       5 AS value
),
inner_query_b as (
   SELECT
       1 AS id,
       2 AS value
)

SELECT
    inner_query_a.value * inner_query_b.value
FROM inner_query_a, inner_query_b
WHERE inner_query_a.id=inner_query_b.id;

and

SELECT
    inner_query_a.value * inner_query_b.value
FROM
    (SELECT 1 AS id, 5 AS value) AS inner_query_a,
    (SELECT 1 AS id, 2 AS value) AS inner_query_b
WHERE
    inner_query_a.id = inner_query_b.id;

are completely identical. This is only a tool to better structure your query, thus there should not need to be any additional support from the SOCI/xeus-sql side, beside accepting that a query that starts with "WITH" can return a table.

The only potential proplem I see is that it does not have to return a table. From a high-level "What does the sql query do?" Point of view, the CTE does not really matter. Thus the simple "first token" approach to checking if the query returns some kind of result set is not valid. And fixing it to correctly ignore the CTE amounts to writing a more or less complete grammar of SQL...as understood by the various backends...yikes...

A question I would pose is, is this distinction needed after all? Due to the way "first line comments" are now allowed (as in commit adacbae ) this whitelisting approach is not really consistent anymore, or do I missunderstand anything?

Would it perhaps be possible to instead change to a blacklisting approach, i.e. define some queries that are sure not to return a result set from scanning the first line, and handle that, and show the resultset of all others, even if that resultset might be empty?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants