Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chatlog: Add minimal DB retry logic #10774

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

DieterReinert
Copy link
Contributor

We encountered a crash on a chat page with the following stack trace:

A chat page crashed: error: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1735344596.246113077,0 encountered previous write with future timestamp 1735344596.478577328,0 within uncertainty interval `t <= (local=1735344596.746113077,0, global=1735344596.746113077,0)`; observed timestamps: [{2 1735344598.610798995,0} {3 1735344596.246113077,0}]: "sql txn" meta={id=a54c7d78 key=/Min pri=0.01119841 epo=0 ts=1735344596.246113077,0 min=1735344596.246113077,0 seq=0} lock=false stat=PENDING rts=1735344596.246113077,0 wto=false gul=1735344596.746113077,0
at /home/ps/main/node_modules/pg-pool/index.js:45:11
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at Object.list (/home/ps/main/server/chat-plugins/chatlog.ts:122:20)
at Object.listCategorized (/home/ps/main/server/chat-plugins/chatlog.ts:130:16)
at Object.list (/home/ps/main/server/chat-plugins/chatlog.ts:432:16)
at PageContext.resolve (/home/ps/main/server/chat.ts:471:10)
at CommandContext.join (/home/ps/main/server/chat-commands/moderation.ts:503:15)

This PR adds a small generic helper function to re-run queries when any error occurs, up to a limited number of retries.

We encountered a crash on a chat page with the following stack trace:
```TS
A chat page crashed: error: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1735344596.246113077,0 encountered previous write with future timestamp 1735344596.478577328,0 within uncertainty interval `t <= (local=1735344596.746113077,0, global=1735344596.746113077,0)`; observed timestamps: [{2 1735344598.610798995,0} {3 1735344596.246113077,0}]: "sql txn" meta={id=a54c7d78 key=/Min pri=0.01119841 epo=0 ts=1735344596.246113077,0 min=1735344596.246113077,0 seq=0} lock=false stat=PENDING rts=1735344596.246113077,0 wto=false gul=1735344596.746113077,0
at /home/ps/main/node_modules/pg-pool/index.js:45:11
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at Object.list (/home/ps/main/server/chat-plugins/chatlog.ts:122:20)
at Object.listCategorized (/home/ps/main/server/chat-plugins/chatlog.ts:130:16)
at Object.list (/home/ps/main/server/chat-plugins/chatlog.ts:432:16)
at PageContext.resolve (/home/ps/main/server/chat.ts:471:10)
at CommandContext.join (/home/ps/main/server/chat-commands/moderation.ts:503:15)
```
This PR adds a small generic helper function to re-run queries when any error occurs, up to a limited number of retries.
@DieterReinert
Copy link
Contributor Author

DieterReinert commented Dec 28, 2024

We could potentially add or improve these things in our safeQuery function:

  • Configurable Delay or Backoff
    Add a delay between retries (either fixed or exponential).

    // Example delay usage inside the loop
    await new Promise(resolve => setTimeout(resolve, 1000)); // 1 second
  • Custom Error Handling
    Skip retries for certain error types if desired.

    // Example signature
    async function safeQuery<T>(
      fn: () => Promise<T>,
      attempts = 3,
      shouldRetry?: (error: Error) => boolean
    ): Promise<T> { /* ... */ }
  • Timeout Handling
    Implement a timeout on each attempt (e.g., using Promise.race) to handle operations that might hang indefinitely.

@DieterReinert DieterReinert changed the title Add minimal generic retry logic to handle transient DB errors Chatlog: Add minimal DB retry logic Dec 31, 2024
@DaWoblefet DaWoblefet requested a review from mia-pi-git January 10, 2025 06:39
@DieterReinert
Copy link
Contributor Author

DieterReinert commented Jan 14, 2025

Another Stack Trace:

A chat page crashed: error: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1736859296.170169540,0 encountered previous write with future timestamp 1736859296.456404574,0 within uncertainty interval `t <= (local=1736859296.670169540,0, global=1736859296.670169540,0)`; observed timestamps: [{1 1736859296.170169540,0} {2 1736859298.725285111,0} {3 1736859296.172367391,0}]: "sql txn" meta={id=9145c423 key=/Min pri=0.02228827 epo=0 ts=1736859296.170169540,0 min=1736859296.170169540,0 seq=0} lock=false stat=PENDING rts=1736859296.170169540,0 wto=false gul=1736859296.670169540,0
at /home/ps/main/node_modules/pg-pool/index.js:45:11
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at Object.list (/home/ps/main/server/chat-plugins/chatlog.ts:122:20)
at Object.listCategorized (/home/ps/main/server/chat-plugins/chatlog.ts:130:16)
at Object.list (/home/ps/main/server/chat-plugins/chatlog.ts:432:16)
at PageContext.resolve (/home/ps/main/server/chat.ts:471:10)
at CommandContext.join (/home/ps/main/server/chat-commands/moderation.ts:503:15)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant