Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL performance on large non-query files #211

Open
kevcunnane opened this issue Apr 17, 2019 · 0 comments
Open

PostgreSQL performance on large non-query files #211

kevcunnane opened this issue Apr 17, 2019 · 0 comments

Comments

@kevcunnane
Copy link
Contributor

kevcunnane commented Apr 17, 2019

As noted during a user study the extension has issues with files that have a large number of non-select statements.

  • In many other tools the entire query is run as 1 batch, with the effect you only get 1 resultset out
  • In this tools service each statement is run separately instead. While this preserves all result sets, it means that for a 15,000 statement file (e.g. do 15K inserts) you'll execute 15K serial queries. Against a remote DB like azure (say with 200ms round trip time) this means you would have (15K x 0.2sec = 3000 sec = 50min) just in round trips.

Proposed solution:

  • pgsqltoolsservice/query/query.py, line 66 splits into batches by statement, but later (pgsqltoolsservice/query/batch.py, create_batch line 190) each statement is parsed & select statements treated differently to non-select statements
  • We could just extend the query batching so that it splits on selects but otherwise runs all statements in a single batch. Hence for inserts / other operations it'll run as a single large batch.
    • If you have a file that does, for example, a Create Table, then 15K inserts, then 2 selects, you'll have 3 total network calls to the server (create + 15K inserts, then one for each select).

If you want to get fancier, you can define user setting as follows:

  • (Default) Smart batch. Run all non-select statements in groups, and select statements on their own
  • No batching. Run the entire query as 1 operation and only get 1 possible select statement back
  • Batch per statement. Run with each statement in its own batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant