Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activity snapshots: Normalize queries for "filter_query_sample = normalize" #462

Merged
merged 4 commits into from
Oct 5, 2023

Conversation

lfittl
Copy link
Member

@lfittl lfittl commented Oct 5, 2023

For historic reasons, we were only normalizing pg_stat_activity query texts when filter_query_sample was set to all, not normalize, which has a similar intent.

Whilst we could also control the filtering of pg_stat_activity query texts with the filter_query_text setting (which today controls whether pg_stat_statements query texts are re-normalized, and is enabled by default), this is intentionally
not done here. The main motivation to not do this for now are performance implications of running normalize every 10 seconds for many active queries, and we may adjust this in a future commit based on more benchmarks.

@lfittl lfittl requested review from msakrejda and a team October 5, 2023 03:30
@keiko713
Copy link
Contributor

keiko713 commented Oct 5, 2023

memo: https://pganalyze.com/docs/collector/settings#pii-filtering-settings

filter_query_sample

default: none, Either none, normalize or all

By default the pganalyze collector does not filter data. Our recommended configuration for servers containing sensitive data is as follows: filter_query_sample: normalize

Note the normalize setting for filter_query_sample, which will remove all query parameter values from query samples, including those contained within automatically collected EXPLAIN plans.

filter_query_text

default: unparsable, Either none or unparsable

@keiko713 keiko713 mentioned this pull request Oct 5, 2023
…alize"

For historic reasons, we were only normalizing pg_stat_activity query
texts when "filter_query_sample" was set to "all", not "normalize", which
has a similiar intent.

Whilst we could also control the filtering of pg_stat_activity query texts
with the "filter_query_text" setting (which today controls whether
pg_stat_statements query texts are re-normalized), this is intentionally
not done here. The main motivation to not do this for now are performance
implications of running normalize every 10 seconds for many active queries,
and we may adjust this in a future commit based on more benchmarks.
@lfittl lfittl force-pushed the activity-snapshot-normalization branch from f321cf2 to ecca5fc Compare October 5, 2023 05:12
@lfittl lfittl changed the title Activity snapshots: Normalize query texts based on "filter_query_text" Activity snapshots: Normalize queries for "filter_query_sample = normalize" Oct 5, 2023
Copy link
Contributor

@keiko713 keiko713 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adjusting, I think it's good to do this way first and we can revisit if it's okay to change more widely in the future.

Minor note: it might be nice to have some comment for why we're passing "unparseable" to util.NormalizeQuery. I don't think the behavior will be much different here if we pass it or server.Config.FilterQueryText so I'm gonna approve anyways, but it would be nice if the reason is mentioned well. Maybe you're doing so so that we can control the behavior when the error happens during the normalization? (and make sure not to use the original query with the error case?)

@lfittl
Copy link
Member Author

lfittl commented Oct 5, 2023

Minor note: it might be nice to have some comment for why we're passing "unparseable" to util.NormalizeQuery. I don't think the behavior will be much different here if we pass it or server.Config.FilterQueryText so I'm gonna approve anyways, but it would be nice if the reason is mentioned well. Maybe you're doing so so that we can control the behavior when the error happens during the normalization? (and make sure not to use the original query with the error case?)

Good point, that's a bit confusing - adding a comment.

@lfittl lfittl merged commit d7957d6 into main Oct 5, 2023
3 checks passed
@lfittl lfittl deleted the activity-snapshot-normalization branch October 5, 2023 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants