Implement new pganalyze.explain_analyze() helper #655

lfittl · 2024-12-19T02:54:22Z

This executes on-demand query runs for the Query Tuning feature via
a helper function that prevents reading the table data directly, as well
as granting the collector user unnecessary permissions.

This is now the only way that query runs can execute EXPLAIN ANALYZE,
direct execution without the helper is intentionally not supported.

input/postgres/explain_analyze_test.go

util/explain_analyze_helper.sql

keiko713 · 2024-12-19T07:45:49Z

I think overall this is looking good (I admit that I haven't checked the SQL part).
Are we going to retire the enable_query_runner flag in the different PR? Also, we will (well, the app will) want to know if the helper function exists or not, so it would be great if it's reported as a collector config.

lfittl · 2024-12-19T07:48:30Z

Are we going to retire the enable_query_runner flag in the different PR? Also, we will (well, the app will) want to know if the helper function exists or not, so it would be great if it's reported as a collector config.

Yeah, I was going to remove the flag in a different PR.

And good point re: the helper - though technically we can also get that via the schema function data that already gets reported.

The main reason to stay with that existing flow that I could see (vs also including it in collector info), is that this is a per-database helper, like the Index Advisor helpers. Storing this information in collector info for many databases might not be ideal?

keiko713 · 2024-12-19T08:31:20Z

The main reason to stay with that existing flow that I could see (vs also including it in collector info), is that this is a per-database helper, like the Index Advisor helpers. Storing this information in collector info for many databases might not be ideal?

Ah, interesting, so you'll need to create a helper function to all databases (if you have many databases). Yeah in that case, sending out in the collector info doesn't make sense.
In that case, we may want to consider adding the function to https://github.com/pganalyze/collector/blob/main/runner/generate_helper_sql.go or providing some similar things for this.

I think I can workaround by checking the function definition on the app side 👍

lfittl · 2024-12-24T04:54:27Z

.github/workflows/ci.yml

+      TEST_DATABASE_URL: postgresql://postgres:postgres@localhost:5432/postgres?sslmode=disable
+    services:
+      postgres:
+        image: postgres:14


Using 14 as a minimum here since the EXPLAIN ANALYZE tests rely on being able to turn off compute_query_id in the output (and that was added in 14).

lfittl · 2024-12-24T04:56:28Z

@keiko713 @seanlinsley FYI, this is ready for re-review. I ended up adding both parameter support and Postgres setting support here as well, since it was quick enough, and even if the server doesn't use it that works fine.

protobuf/server_message.proto

util/sql_helpers.go

input/postgres/explain_analyze.go

main.go

util/explain_analyze_helper.sql

runner/query_run.go

main.go

This executes on-demand query runs for the Query Tuning feature via a helper function that prevents reading the table data directly, as well as granting the collector user unnecessary permissions. This is now the only way that query runs can execute EXPLAIN ANALYZE, direct execution without the helper is intentionally not supported.

Previously this was using "defer db.Close()" in a loop, which would cause the database connections to only be released once the outside function terminates, meaning connections were kept open too long. To close each connection correctly when the run finishes, and to reduce the duplication of some of the error handling, use a dedicated function to process the parts of a query run that use a database connection.

…estMode

Like "--generate-stats-helper-sql" this allows generating a SQL script that installs the pganalyze.explain_analyze helper into all databases on a server, as specified by the server section name passed as an argument. Additionally, the "--generate-helper-explain-analyze-role" option allows setting the role that the helper function should be owned by as a value, defaulting to "pganalyze_explain". Note that the role creation, as well as any GRANT statements for the role must be taken care of separately.

lfittl · 2024-12-28T00:35:17Z

@seanlinsley @keiko713 This is ready for re-review, all feedback should be addressed.

runner/websocket.go

runner/query_run.go

seanlinsley · 2024-12-30T20:05:43Z

runner/run.go

@@ -130,6 +130,34 @@ func Run(ctx context.Context, wg *sync.WaitGroup, globalCollectionOpts state.Col
 		return
 	}

+	if globalCollectionOpts.GenerateExplainAnalyzeHelperSql != "" {
+		wg.Add(1)
+		testRunSuccess = make(chan bool)


It doesn't have to be part of this PR, but we may want to add test run output for the presence of the helper function and success on running a test explain.

Yeah, that's a good point. I'll add this in a follow-up PR.

seanlinsley · 2024-12-30T20:34:18Z

util/explain_analyze_helper.sql

+  PERFORM 1 FROM pg_roles WHERE (rolname = current_user AND rolsuper) OR (pg_has_role(oid, 'MEMBER') AND rolname IN ('rds_superuser', 'azure_pg_admin', 'cloudsqlsuperuser'));
+  IF FOUND THEN
+    RAISE EXCEPTION 'cannot run: helper is owned by superuser - recreate function with lesser privileged user';
+  END IF;


This error occurs in local testing because my Postgres user is a superuser. Maybe the OR should be an AND instead?

For reference, these are the values for the different where clauses:

select rolsuper, pg_has_role(oid, 'MEMBER'), rolname IN ('rds_superuser', 'azure_pg_admin', 'cloudsqlsuperuser') from pg_roles where rolname = current_user; rolsuper | pg_has_role | ?column? ----------+-------------+---------- t | t | f

If the logic in this check is intentional, what should I do to my local Postgres install to work around it, and how should we communicate this issue in the docs?

If you follow the instructions on setting up the separate user (preview) it should work as expected, even in a local setup. Its intentionally an OR, since we want to block users that run Postgres on a VM to use a real superuser for this.

seanlinsley · 2024-12-30T20:50:01Z

runner/query_run.go

+	server.QueryRunsMutex.Unlock()
+
+	for name, value := range query.PostgresSettings {
+		_, err = db.ExecContext(ctx, postgres.QueryMarkerSQL+"SET %s = %s", pq.QuoteIdentifier(name), pq.QuoteLiteral(value))


When testing this locally with statement_timeout set to 1s, this fails with pq: syntax error at or near "%". Looks like a fmt.Sprintf call is needed.

Oh yeah, strange, I must have dropped that in a refactoring by accident.

This now works as expected for your test case (that you had in the other PR).

That said, I do wonder if we should explicitly block changing statement_timeout through the parameters in the query run, or at least have some higher limit that's a hard limit? (i.e. can't be changed by the server side)

Maybe something like 1 minute as the default limit, can be raised to up to 10 minutes by the server side, but nothing higher?

Assuming we'd want to do that in the future, its likely easier to enforce by passing it as a separate value (not a free form parameter, i.e block it in postgres_settings), so we don't have to deal with 20min vs 20s parameter value parsing.

Agreed it makes sense to block it here and make it a standalone setting so we can validate it. But applying a max value is going to depend on the type of query run that's being performed; if in the future we're doing a concurrent reindex it should be fine for that to run for an arbitrary amount of time.

Yeah, agreed. I'll filter it out here for now and we can add customizing it in a separate PR.

Actually on second thought a separate field isn't really needed: we can validate the value in postgres_settings since the map key is always going to be named statement_timeout.

For now though, we can bypass this issue by making the statement timeout non-configurable. That can be done by moving the existing postgres.SetStatementTimeout call after this code, so the hardcoded 1 minute timeout is always used.

Actually on second thought a separate field isn't really needed: we can validate the value in postgres_settings since the map key is always going to be named statement_timeout.

The problem is not the key, but the value - since you can use different duration units (seconds vs minutes), we'd have to parse them in order to validate the value.

For now though, we can bypass this issue by making the statement timeout non-configurable. That can be done by moving the existing postgres.SetStatementTimeout call after this code, so the hardcoded 1 minute timeout is always used.

Sure, I'll move it down for now so it takes effect after the other settings.

Per the Postgres documentation at https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING "The special condition name OTHERS matches every error type except QUERY_CANCELED and ASSERT_FAILURE.", thus we need to explicitly include the QUERY_CANCELED in order to deallocate the statement in case of statement timeouts. Since we don't use asserts, we do not need to catch ASSERT_FAILURE.

Since we're currently not filtering which connection level parameters get set on the collector side, this ensures that we enforce the statement_timeout to 60 secs even if the server sends a different value.

seanlinsley · 2025-01-02T16:14:37Z

This looks good overall, but I did run into one bug: the statement timeout doesn't seem to take effect. For example if I run a SELECT pg_sleep(90) query through the system, the collector responds back with JSON including "Execution Time": 90001.987.

lfittl · 2025-01-02T16:18:56Z

This looks good overall, but I did run into one bug: the statement timeout doesn't seem to take effect. For example if I run a SELECT pg_sleep(90) query through the system, the collector responds back with JSON including "Execution Time": 90001.987.

Strange - I had tested that and it worked for me, but I can recheck tomorrow. It sounds like you got an actual result vs the EXPLAIN without ANALYZE that should happen when the query times out.

lfittl requested a review from a team December 19, 2024 02:57

msakrejda reviewed Dec 19, 2024

View reviewed changes

input/postgres/explain_analyze_test.go Outdated Show resolved Hide resolved

lfittl commented Dec 19, 2024

View reviewed changes

util/explain_analyze_helper.sql Outdated Show resolved Hide resolved

lfittl force-pushed the explain-function-helper branch 2 times, most recently from 0771dbe to c61e0e4 Compare December 24, 2024 04:52

lfittl commented Dec 24, 2024

View reviewed changes

seanlinsley reviewed Dec 24, 2024

View reviewed changes

keiko713 reviewed Dec 25, 2024

View reviewed changes

runner/query_run.go Show resolved Hide resolved

main.go Outdated Show resolved Hide resolved

lfittl added 3 commits December 27, 2024 15:28

EXPLAIN ANALYZE tests: Enable on CI, rework per feedback and remove t…

65799af

…estMode

lfittl force-pushed the explain-function-helper branch from 839f4df to 7014175 Compare December 27, 2024 23:30

lfittl added 2 commits December 27, 2024 15:32

Query runner: Add parameter handling and Postgres settings support

e615e9a

Address review feedback

c3faec6

lfittl force-pushed the explain-function-helper branch from 7014175 to c4de2a7 Compare December 27, 2024 23:48

lfittl force-pushed the explain-function-helper branch from c4de2a7 to 8fcc1f9 Compare December 27, 2024 23:53

Reword error messages for pganalyze.explain_analyze helper per feedback

f192625

lfittl force-pushed the explain-function-helper branch from dc15ca0 to f192625 Compare December 28, 2024 00:08

seanlinsley reviewed Dec 30, 2024

View reviewed changes

lfittl added 3 commits December 30, 2024 15:49

Address review feedback

e9bf24b

Enforce statement_timeout after we set parameters

4ae7b1c

Since we're currently not filtering which connection level parameters get set on the collector side, this ensures that we enforce the statement_timeout to 60 secs even if the server sends a different value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new pganalyze.explain_analyze() helper #655

Implement new pganalyze.explain_analyze() helper #655

lfittl commented Dec 19, 2024

keiko713 commented Dec 19, 2024

lfittl commented Dec 19, 2024

keiko713 commented Dec 19, 2024

lfittl Dec 24, 2024

lfittl commented Dec 24, 2024

lfittl commented Dec 28, 2024

seanlinsley Dec 30, 2024

lfittl Dec 30, 2024

seanlinsley Dec 30, 2024 •

edited

Loading

lfittl Dec 30, 2024 •

edited

Loading

seanlinsley Dec 30, 2024

lfittl Dec 30, 2024

lfittl Dec 31, 2024 •

edited

Loading

seanlinsley Dec 31, 2024

lfittl Dec 31, 2024

seanlinsley Dec 31, 2024 •

edited

Loading

lfittl Dec 31, 2024

seanlinsley commented Jan 2, 2025

lfittl commented Jan 2, 2025

Implement new pganalyze.explain_analyze() helper #655

Are you sure you want to change the base?

Implement new pganalyze.explain_analyze() helper #655

Conversation

lfittl commented Dec 19, 2024

keiko713 commented Dec 19, 2024

lfittl commented Dec 19, 2024

keiko713 commented Dec 19, 2024

Choose a reason for hiding this comment

lfittl commented Dec 24, 2024

lfittl commented Dec 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlinsley Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

lfittl Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lfittl Dec 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlinsley Dec 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanlinsley commented Jan 2, 2025

lfittl commented Jan 2, 2025

seanlinsley Dec 30, 2024 •

edited

Loading

lfittl Dec 30, 2024 •

edited

Loading

lfittl Dec 31, 2024 •

edited

Loading

seanlinsley Dec 31, 2024 •

edited

Loading