Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document queries syntax #44

Open
denisri opened this issue Dec 1, 2020 · 5 comments
Open

Document queries syntax #44

denisri opened this issue Dec 1, 2020 · 5 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@denisri
Copy link
Contributor

denisri commented Dec 1, 2020

What's the query syntax in DatabaseSession.filter_documents() ?
It's not SQL, it's an unknown language...
The docs only say:

        filter_query: Filter query (str)

                                - A filter row must be written this way: {<field>} <operator> "<value>"
                                - The operator must be in ('==', '!=', '<=', '>=', '<', '>', 'IN', 'ILIKE', 'LIKE')
                                - The filter rows can be linked with ' AND ' or ' OR '
                                - Example: "((({BandWidth} == "50000")) AND (({FileName} LIKE "%G1%")))"

I don't understand why there are 2 parentheses around "lines", and I can't make a working query on my own.
I just want to perform a simple query like "select documents in a list of IDs".
I tried thinks like:

{FileName} in ("/path1", "/path2")

or:

{FileName} in (("/path1", "/path2"))

but always end up with a syntax error that I cannot interpret, like this one:

Unexpected token Token(LPAR, '(') at line 1, column 15.
Expected one of: 
        * LSQB
        * TIME
        * FIELD_NAME
        * KEYWORD_LITERAL
        * DATE
        * ESCAPED_STRING
        * SIGNED_NUMBER
        * QUOTED_FIELD_NAME
        * DATETIME

Sure, I know nothing about Lark, not even if it's a language, a parser library, or anything else, but I guess a standard user doesn't want to know...

We really need a documentation for the query language...

Moreover it's not easy in the Populse_db API to get a collection primary key to perform queries on: currently we need to get to the underlying SQLite engine to get it (or maybe I missed something ?)

For now I could only make my request by totally bypassing the DatabaseSession layer and getting directly to the SQLite engine, but of course this is not what it's meant for...

@denisri denisri added enhancement New feature or request question Further information is requested labels Dec 1, 2020
@sapetnioc
Copy link
Contributor

Yes, there many things to improve in populse_db documentation. The query language has nothing to do with Lark. This is just the library to parse the grammar. I defined this grammar. What you try to do is an IN operator between a field and a list literal. I think this is not implemented. You can use FileName == "/path1" OR FileName == "/path2" for now.

I do not understand what cannot be done on collection primary key. A primary key is a field as any other and therefore can be used on queries. There is no join operation right now if this is what you are looking for.

We must never use directly the SQlite engine because this would make code incompatible with another engine (Postgres will be back one day). If ther is a need, we must enhance pobulse_db API.

@sapetnioc
Copy link
Contributor

For primary key I may have understand. Are you looking for dbs.collection(collection).primary_key ? This is available in populse_db 2.

@denisri
Copy link
Contributor Author

denisri commented Dec 1, 2020

For primary key I may have understand. Are you looking for dbs.collection(collection).primary_key ? This is available in populse_db 2.

That's probably what I was looking for, thanks ! ;)

We must never use directly the SQlite engine because this would make code incompatible with another engine (Postgres will be back one day). If ther is a need, we must enhance pobulse_db API.

I know and that was precisely why I was asking...

So the IN syntax doesn't exist ? That would explain why I couldn't make it work... (it is actually in the operators listed in the doc, however).
Using FileName == "/path1" OR FileName == "/path2" will make very long requests (maybe hitting a size limit) when I have a long list of elements to retreive. Speaking of which, python.sqlite3 has a syntax to pass "where_data", but Populse_db has not the equivalent, all must be in a single string. Will it be a problem for large requests ?

@sapetnioc
Copy link
Contributor

Well, the IN operator exists but is only implemented for searching an element (given as a field name or a literal) in a list field (i.e. a field name). We can extend the query syntax if necessary.

I am not sur it is very efficient to perform large queries. We may also think to improve the query system to put data out of the query string.

@denisri
Copy link
Contributor Author

denisri commented Dec 1, 2020

OK I totally misunderstood the meaning of the IN operator ;)
So for now we have no efficient way of performing such a request...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants