-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement new pganalyze.explain_analyze() helper #655
base: main
Are you sure you want to change the base?
Conversation
I think overall this is looking good (I admit that I haven't checked the SQL part). |
Yeah, I was going to remove the flag in a different PR. And good point re: the helper - though technically we can also get that via the schema function data that already gets reported. The main reason to stay with that existing flow that I could see (vs also including it in collector info), is that this is a per-database helper, like the Index Advisor helpers. Storing this information in collector info for many databases might not be ideal? |
Ah, interesting, so you'll need to create a helper function to all databases (if you have many databases). Yeah in that case, sending out in the collector info doesn't make sense. I think I can workaround by checking the function definition on the app side 👍 |
0771dbe
to
c61e0e4
Compare
TEST_DATABASE_URL: postgresql://postgres:postgres@localhost:5432/postgres?sslmode=disable | ||
services: | ||
postgres: | ||
image: postgres:14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using 14 as a minimum here since the EXPLAIN ANALYZE tests rely on being able to turn off compute_query_id
in the output (and that was added in 14).
@keiko713 @seanlinsley FYI, this is ready for re-review. I ended up adding both parameter support and Postgres setting support here as well, since it was quick enough, and even if the server doesn't use it that works fine. |
This executes on-demand query runs for the Query Tuning feature via a helper function that prevents reading the table data directly, as well as granting the collector user unnecessary permissions. This is now the only way that query runs can execute EXPLAIN ANALYZE, direct execution without the helper is intentionally not supported.
Previously this was using "defer db.Close()" in a loop, which would cause the database connections to only be released once the outside function terminates, meaning connections were kept open too long. To close each connection correctly when the run finishes, and to reduce the duplication of some of the error handling, use a dedicated function to process the parts of a query run that use a database connection.
839f4df
to
7014175
Compare
7014175
to
c4de2a7
Compare
Like "--generate-stats-helper-sql" this allows generating a SQL script that installs the pganalyze.explain_analyze helper into all databases on a server, as specified by the server section name passed as an argument. Additionally, the "--generate-helper-explain-analyze-role" option allows setting the role that the helper function should be owned by as a value, defaulting to "pganalyze_explain". Note that the role creation, as well as any GRANT statements for the role must be taken care of separately.
c4de2a7
to
8fcc1f9
Compare
dc15ca0
to
f192625
Compare
@seanlinsley @keiko713 This is ready for re-review, all feedback should be addressed. |
@@ -130,6 +130,34 @@ func Run(ctx context.Context, wg *sync.WaitGroup, globalCollectionOpts state.Col | |||
return | |||
} | |||
|
|||
if globalCollectionOpts.GenerateExplainAnalyzeHelperSql != "" { | |||
wg.Add(1) | |||
testRunSuccess = make(chan bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have to be part of this PR, but we may want to add test run output for the presence of the helper function and success on running a test explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point. I'll add this in a follow-up PR.
PERFORM 1 FROM pg_roles WHERE (rolname = current_user AND rolsuper) OR (pg_has_role(oid, 'MEMBER') AND rolname IN ('rds_superuser', 'azure_pg_admin', 'cloudsqlsuperuser')); | ||
IF FOUND THEN | ||
RAISE EXCEPTION 'cannot run: helper is owned by superuser - recreate function with lesser privileged user'; | ||
END IF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error occurs in local testing because my Postgres user is a superuser. Maybe the OR should be an AND instead?
For reference, these are the values for the different where clauses:
select rolsuper, pg_has_role(oid, 'MEMBER'), rolname IN ('rds_superuser', 'azure_pg_admin', 'cloudsqlsuperuser') from pg_roles where rolname = current_user;
rolsuper | pg_has_role | ?column?
----------+-------------+----------
t | t | f
If the logic in this check is intentional, what should I do to my local Postgres install to work around it, and how should we communicate this issue in the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you follow the instructions on setting up the separate user (preview) it should work as expected, even in a local setup. Its intentionally an OR, since we want to block users that run Postgres on a VM to use a real superuser for this.
runner/query_run.go
Outdated
server.QueryRunsMutex.Unlock() | ||
|
||
for name, value := range query.PostgresSettings { | ||
_, err = db.ExecContext(ctx, postgres.QueryMarkerSQL+"SET %s = %s", pq.QuoteIdentifier(name), pq.QuoteLiteral(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing this locally with statement_timeout
set to 1s
, this fails with pq: syntax error at or near "%"
. Looks like a fmt.Sprintf
call is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, strange, I must have dropped that in a refactoring by accident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now works as expected for your test case (that you had in the other PR).
That said, I do wonder if we should explicitly block changing statement_timeout
through the parameters in the query run, or at least have some higher limit that's a hard limit? (i.e. can't be changed by the server side)
Maybe something like 1 minute as the default limit, can be raised to up to 10 minutes by the server side, but nothing higher?
Assuming we'd want to do that in the future, its likely easier to enforce by passing it as a separate value (not a free form parameter, i.e block it in postgres_settings
), so we don't have to deal with 20min
vs 20s
parameter value parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it makes sense to block it here and make it a standalone setting so we can validate it. But applying a max value is going to depend on the type of query run that's being performed; if in the future we're doing a concurrent reindex it should be fine for that to run for an arbitrary amount of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. I'll filter it out here for now and we can add customizing it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on second thought a separate field isn't really needed: we can validate the value in postgres_settings
since the map key is always going to be named statement_timeout
.
For now though, we can bypass this issue by making the statement timeout non-configurable. That can be done by moving the existing postgres.SetStatementTimeout
call after this code, so the hardcoded 1 minute timeout is always used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on second thought a separate field isn't really needed: we can validate the value in
postgres_settings
since the map key is always going to be namedstatement_timeout
.
The problem is not the key, but the value - since you can use different duration units (seconds vs minutes), we'd have to parse them in order to validate the value.
For now though, we can bypass this issue by making the statement timeout non-configurable. That can be done by moving the existing postgres.SetStatementTimeout call after this code, so the hardcoded 1 minute timeout is always used.
Sure, I'll move it down for now so it takes effect after the other settings.
Per the Postgres documentation at https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING "The special condition name OTHERS matches every error type except QUERY_CANCELED and ASSERT_FAILURE.", thus we need to explicitly include the QUERY_CANCELED in order to deallocate the statement in case of statement timeouts. Since we don't use asserts, we do not need to catch ASSERT_FAILURE.
Since we're currently not filtering which connection level parameters get set on the collector side, this ensures that we enforce the statement_timeout to 60 secs even if the server sends a different value.
This looks good overall, but I did run into one bug: the statement timeout doesn't seem to take effect. For example if I run a |
Strange - I had tested that and it worked for me, but I can recheck tomorrow. It sounds like you got an actual result vs the EXPLAIN without ANALYZE that should happen when the query times out. |
This executes on-demand query runs for the Query Tuning feature via
a helper function that prevents reading the table data directly, as well
as granting the collector user unnecessary permissions.
This is now the only way that query runs can execute EXPLAIN ANALYZE,
direct execution without the helper is intentionally not supported.