Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure ACKs are sent after backup #921

Merged
merged 7 commits into from
Oct 20, 2024

Conversation

dt8269
Copy link
Contributor

@dt8269 dt8269 commented Oct 7, 2024

Summary

This PR addresses an issue where acknowledgments (ACKs) were sometimes sent to the master before binlog events were fully written and fsynced to disk during backup operations. Sending ACKs prematurely in semi-synchronous replication could lead to data loss if the replica fails after sending the ACK but before persisting the event.

To fix this issue, the following changes have been made:

  • Introduced a SynchronousEventHandler in BinlogSyncerConfig to ensure that events are fully processed and persisted before sending ACKs.
  • Modified StartBackup to use the synchronous event handler when it's set, enabling synchronous handling during backup operations.
  • Refactored backup logic into a new BackupEventHandler that writes events to disk and ensures they are fully persisted before acknowledging.
  • Updated onStream and parseEvent methods in BinlogSyncer to support both synchronous and asynchronous event handling modes.
  • Added proper error handling and synchronization mechanisms to ensure thread safety and data integrity during backups.

Testing

  • Added unit tests TestAsyncBackup and TestSyncBackup to verify the backup process in both asynchronous and synchronous modes.

…times sent to the master before binlog events were fully written and fsynced to disk during backup operations. Sending ACKs prematurely in semi-synchronous replication could lead to data loss if the replica fails after sending the ACK but before persisting the event.

Key changes:

- Introduced an `EventHandler` interface with a `HandleEvent` method for
  processing binlog events. This allows custom event handling logic to
  be injected into the replication stream.

- Added an `eventHandler` field to `BinlogSyncer` and provided a
  `SetEventHandler` method to assign an event handler. This enables
  `BinlogSyncer` to delegate event processing to the assigned handler.

- Implemented `BackupEventHandler` which writes binlog events to disk
  and ensures that each event is fsynced before returning. This ensures
  data durability before ACKs are sent.

- Modified the `onStream` method in `BinlogSyncer` to separate event
  parsing (`parseEvent`) from event handling and ACK sending
  (`handleEventAndACK`). This adheres to the single-responsibility
  principle and makes the code cleaner.

- Moved state updates (e.g., updating `b.nextPos`) and GTID set handling
  from `parseEvent` to `handleEventAndACK` to avoid side effects during
  parsing.

- Ensured that ACKs are sent only after the event has been fully
  processed and fsynced by sending the ACK in `handleEventAndACK` after
  event handling.
@dt8269 dt8269 marked this pull request as ready for review October 7, 2024 18:19
Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will review later

replication/binlogsyncer.go Outdated Show resolved Hide resolved
replication/binlogsyncer.go Outdated Show resolved Hide resolved
…h SynchronousEventHandler. Simplify the event processing in BinlogSyncerConfig by introducing SynchronousEventHandler for synchronous event handling. Update StartBackup, StartBackupWithHandler, and associated tests to reflect these changes.
@dt8269 dt8269 force-pushed the dterry/durablebinlogs branch from bcb07a6 to d162219 Compare October 9, 2024 19:43
Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm. Thanks.

replication/binlogsyncer.go Show resolved Hide resolved
replication/binlogsyncer.go Outdated Show resolved Hide resolved

// Force use raw mode
b.parser.SetRawMode(true)

// Set up the backup event handler
backupHandler := &BackupEventHandler{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to change StartBackupWithHandler now? Because your requirement does not use StartBackupWithHandler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic of writing on disk is contained in

func (h *BackupEventHandler) HandleEvent(e *BinlogEvent) error {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the old code, we still can use handler to create io.WriteCloser and write events to disk.

		e, err := s.GetEvent(ctx)
...
		if n, err := w.Write(e.RawData); err != nil {
			return errors.Trace(err)
		} else if n != len(e.RawData) {
			return errors.Trace(io.ErrShortWrite)
		}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but I don't want to duplicate the code in HandleEvent. Do you want me to remove the BackupEventHandler struct and have HandleEvent be static?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate the code in HandleEvent

Sorry I don't know where is the duplication. Can you elaborate?

Do you want me to remove the BackupEventHandler struct and have HandleEvent be static?

No I don't mean that. I mean reverting StartBackupWithHandler to old code. Your StartSynchronousBackup is fine. And BackupEventHandler seems no need to be public (exported)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartBackupWithHandler implemented the code in HandleEvent which is also used by the SynchronousEventHandler in replication/binlogsyncer.go. As a helpful shorthand to prevent the need for clients to reimplement HandleEvent in their code, we can leave it extracted as its own method.

For example, if we reverted StartBackupWithHandler we'd either have that code (if e.Header.EventType == ROTATE_EVENT...) duplicated in HandleEvent or, if we removed that as well (as you suggested) we'd have it duplicated in the test code. Does that make sense?

If you have a different idea of how this should all change, can you upload an example diff?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see the duplication is if e.Header.EventType == ROTATE_EVENT, thanks. I think it's reasonable.

And here's my suggestions, please check them.

git.diff.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we make newBackupEventHandler private, clients won't be able to do what backup_test.go (an example client) is doing

Copy link
Collaborator

@lance6716 lance6716 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they are the same behaviour as BackupEventHandler, they can directly use SynchronousEventHandler. if they have more customised behaviour, current BackupEventHandler is not extendable to other behaviour I think (no place to inject new logic?). so they still need to write many lines of code.

maybe you think they can wrap a structure on BackupEventHandler, dispatch event by event type to BackupEventHandler or the parent structure? I can't think of a real use case for now.

oh you are right 👍


if isSynchronous {
// Set up a BackupEventHandler for synchronous mode
backupHandler := &BackupEventHandler{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after my previous comment, now the BackupEventHandler is only used in this test. Maybe move its declaration to this test file.

replication/backup_test.go Show resolved Hide resolved
… return the behavior to 30 days _between_ events, restore some comments, use struct instead of bool as recommended, add a note about SynchronousEventHandler and the parseEvent return values
@dt8269 dt8269 force-pushed the dterry/durablebinlogs branch 3 times, most recently from 18cacdc to c3188f5 Compare October 11, 2024 21:28
@dt8269 dt8269 force-pushed the dterry/durablebinlogs branch from c3188f5 to 06c9268 Compare October 13, 2024 17:45
Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm now. please address if you like the new way to deal with select-contexts. not a big problem. I'll merge it when it's daytime in my timezone.

@lance6716 lance6716 merged commit c435689 into go-mysql-org:master Oct 20, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants