Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax locking requirements for collecting table stats #455

Merged
merged 2 commits into from
Oct 2, 2023

Conversation

msakrejda
Copy link
Contributor

Currently, if tables are locked in AccessExclusiveMode, we skip
collecting most table structure information and table and column
stats. We do this since the lock would conflict with the lock the
collector needs and potentially cause cascading locking problems and
impact the application. However, we're not checking the locktype, so
the collector also avoids collecting stats for tables with
AccessExclusiveMode locktype = 'tuple' locks, which do not actually
block collection. These can be fairly common, e.g., if a transaction
issues a SELECT FOR UPDATE query while another transaction has
already obtained a SELECT FOR UPDATE lock.

Relax the locking requirements by ignoring AccessExclusiveMode
locktype = 'tuple' locks when collecting schema structure and stats.

@msakrejda msakrejda requested a review from a team September 13, 2023 19:08
Copy link
Contributor Author

@msakrejda msakrejda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this manually on all affected queries with an outstanding locktype = 'tuple' AccessExclusiveLock. It'd be nice to add this to our integration tests, but locking doesn't currently have coverage and I don't see a great way to add that. Thoughts?

@@ -14,7 +14,7 @@ const relationStatsSQLInsertsSinceVacuumFieldDefault string = "0"

const relationStatsSQL = `
WITH locked_relids AS (
SELECT DISTINCT relation relid FROM pg_catalog.pg_locks WHERE mode = 'AccessExclusiveLock' AND relation IS NOT NULL
SELECT DISTINCT relation relid FROM pg_catalog.pg_locks WHERE mode = 'AccessExclusiveLock' AND relation IS NOT NULL AND locktype <> 'tuple'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if locktype <> 'tuple' or locktype = 'relation' makes more sense. There are a lot of different locktypes so I decided to be more conservative, but I could be convinced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhmm, looking at the code, only relation, extend, page, tuple locktype can have relation column for pg_locks (look at values[2] or nulls[2]):
https://github.com/postgres/postgres/blob/e0b2eed047df9045664da6f724cb42c10f8b12f0/src/backend/utils/adt/lockfuncs.c#L246-L354

Among this, feels like page is also somewhat similar to tuple, in terms of it won't be locking pg_table_size(). I'm not so sure about extend, but looking at the purpose of this lock (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/apg-waits.lockextend.html), maybe it also won't cause problems with pg_table_size()?

If you agree, maybe using locktype = 'relation' makes more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for diving into the Postgres source to investigate this. I think given this, locktype = 'relation' looks reasonable. I think the only concern there is if a future version of Postgres adds a lock type similar to, but distinct from, locktype = 'relation', and we start trying to collect data when we shouldn't. Alternately, we could change this to locktype NOT IN ('extend', 'page', 'tuple') to be a little more future-proof, but that gets a little wordy. Any thoughts on this @lfittl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good question. I feel like this part of Postgres changes rarely (if at all), so I'm not too worried about future proofing, but it feels slightly safer to say "these are the locktypes that are not a problem, even when its an AccessExclusiveLock", i.e. I think locktype NOT IN ('extend', 'page', 'tuple') might indeed be better, even if its wordy.

That said, is there a way for us to verify that its safe to ignore relation extends? (extend = table is grown to add new pages -- I assume that the locks taken for that don't conflict with relation size / catalog entries, but would be good to verify somehow)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be hard to control the timing of an extend lock. But maybe we could note what locks the collector needs when running these queries, take those manually, and then try to extend a relation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to read a bit more of the Postgres source, and my understanding is because relation extension uses a separate lock tag (LOCKTAG_RELATION_EXTEND, see https://github.com/postgres/postgres/blob/master/src/include/storage/lock.h#L195) it doesn't conflict with the AccessShareLock taken with the LOCKTAG_RELATION lock tag as part of us calling pg_relation_size.

So for now I think we can assume that all three of these are safe to ignore, and its just locktype = 'relation' that's a problem.

input/postgres/relations.go Outdated Show resolved Hide resolved
Copy link
Contributor

@keiko713 keiko713 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comment re: locktype, though this is already good start, so I'm fine to merge as is too.

Copy link
Member

@lfittl lfittl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but I think we could adjust to locktype = 'relation' as discussed to ignore page and extend locks as well.

@@ -14,7 +14,7 @@ const relationStatsSQLInsertsSinceVacuumFieldDefault string = "0"

const relationStatsSQL = `
WITH locked_relids AS (
SELECT DISTINCT relation relid FROM pg_catalog.pg_locks WHERE mode = 'AccessExclusiveLock' AND relation IS NOT NULL
SELECT DISTINCT relation relid FROM pg_catalog.pg_locks WHERE mode = 'AccessExclusiveLock' AND relation IS NOT NULL AND locktype <> 'tuple'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to read a bit more of the Postgres source, and my understanding is because relation extension uses a separate lock tag (LOCKTAG_RELATION_EXTEND, see https://github.com/postgres/postgres/blob/master/src/include/storage/lock.h#L195) it doesn't conflict with the AccessShareLock taken with the LOCKTAG_RELATION lock tag as part of us calling pg_relation_size.

So for now I think we can assume that all three of these are safe to ignore, and its just locktype = 'relation' that's a problem.

Currently, if tables are locked in AccessExclusiveMode, we skip
collecting most table structure information and table and column
stats. We do this since the lock would conflict with the lock the
collector needs and potentially cause cascading locking problems and
impact the application. However, we're not checking the locktype, so
the collector also avoids collecting stats for tables with
AccessExclusiveMode locktype = 'tuple' locks, which do not actually
block collection. These can be fairly common, e.g., if a transaction
issues a `SELECT FOR UPDATE` query while another transaction has
already obtained a `SELECT FOR UPDATE` lock.

Relax the locking requirements by ignoring AccessExclusiveMode
locktype = 'tuple' locks when collecting schema structure and stats.
The latter should be sufficient to avoid locking problems and allows
us to ignore other kinds of locks.
@msakrejda msakrejda force-pushed the relax-locking-requirements branch from ffdcddf to a2581b5 Compare October 2, 2023 15:47
@msakrejda msakrejda merged commit 54cbc68 into main Oct 2, 2023
3 checks passed
@msakrejda msakrejda deleted the relax-locking-requirements branch October 2, 2023 16:25
@benoittgt
Copy link
Contributor

Thanks. This patch fixed our issue.

@msakrejda
Copy link
Contributor Author

@benoittgt great, glad to hear it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants