You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This may not be worth fixing or this may be "as designed" behavior, but it seems that, if the filesystem is read-only or otherwise non-writable (I have not verified, but I suspect similar things would happen for permission issues or if the disk is full), logs will be collected (e.g. via Pub/Sub) and then simply dropped on the floor.
Thankfully, the collector does log errors:
E [default] Could not allocate tempfile for logs: open /tmp/532025703: read-only file system
I [default] Submitted compact activity snapshot successfully
E [default] Could not allocate tempfile for logs: open /tmp/1803624709: read-only file system
I [default] Submitted compact activity snapshot successfully
There are a few possible approaches here:
The collector could sanitize files in-memory, rather than writing a temporary file
The collector could avoid acknowledging the message until it is saved successfully (this is probably a good idea, anyways, since if the collector crashes, messages may not have been processed even though they were acknowledged)
It's also reasonable that the collector may be designed to be best-effort, so dropping logs (or acknowledging but not processing messages in exceptional cases like a crash) might be acceptable
The text was updated successfully, but these errors were encountered:
Thanks for bringing this up - overall I would say we do not aim to make the collector work with a read-only file system at this point. For context, separate from log files being written out temporarily, there is a state file that is used for restarts/reloads - whilst we could refactor how that works for reloads specifically (i.e. keep it in memory somehow), it'd cause some extra complexity.
Regarding not acknowledging Pub/Sub messages in error cases - we could avoid doing so if the log processing fails in the early stages, but there are failure cases later in the process too (mostly related to out of order log messages), and as you note crashes can occur too.
I think re-architecting this to not acknowledge the messages until they are saved (i.e. much later in the process than currently, at least 3 seconds later) will cause a different problem, as the Pub/Sub queue may back up noticeably when there is a high volume of log messages, and I recall that can cause issues with GCP's APIs - in my understanding they want readers to acknowledge messages quickly.
All that said, I'm open to a proposal how to handle things differently (preferably with a working PR), but its not a priority to do major rework here for us at this point. We may revise this in a future when we add more auditing (as in pg_audit) focused features to pganalyze.
This may not be worth fixing or this may be "as designed" behavior, but it seems that, if the filesystem is read-only or otherwise non-writable (I have not verified, but I suspect similar things would happen for permission issues or if the disk is full), logs will be collected (e.g. via Pub/Sub) and then simply dropped on the floor.
Thankfully, the collector does log errors:
There are a few possible approaches here:
collector/input/system/google_cloudsql/logs.go
Line 72 in 878802b
It's also reasonable that the collector may be designed to be best-effort, so dropping logs (or acknowledging but not processing messages in exceptional cases like a crash) might be acceptable
The text was updated successfully, but these errors were encountered: