Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/robustness: init with powerfailure case #622

Merged
merged 1 commit into from
Dec 8, 2023

Conversation

fuweid
Copy link
Member

@fuweid fuweid commented Nov 25, 2023

Add Robustness Test pipeline for robustness test cases.

REF: #568

@fuweid
Copy link
Member Author

fuweid commented Nov 29, 2023

ping @ahrtr

go.mod Outdated Show resolved Hide resolved
@fuweid fuweid force-pushed the powerfailure branch 2 times, most recently from cdfc75f to e60530d Compare December 7, 2023 15:24
@fuweid fuweid changed the title tests: introduce powerfailure case tests/robustness: init with powerfailure case Dec 7, 2023
Comment on lines 6 to 9
strategy:
matrix:
os: [ubuntu-latest]
runs-on: ${{ matrix.os }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
strategy:
matrix:
os: [ubuntu-latest]
runs-on: ${{ matrix.os }}
runs-on: ubuntu-latest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Makefile Outdated Show resolved Hide resolved
tests/robustness/powerfailure_test.go Outdated Show resolved Hide resolved
Comment on lines 45 to 46
// FIXME: gofail should support unix socket so that the test cases won't
// be conflicted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the conflict?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run into port-already-being-used issue in etcd flakey case when that test case doesn't close port after finish.
I remove that comment since it seems confusing.

time.Sleep(time.Duration(time.Now().UnixNano()%5+1) * time.Second)
t.Logf("simulate power failure")

activeFailpoint(t, fpURL, "beforeSyncMetaPage", "panic")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking we should also support forcibly killing the process so that the process can exit at a random point?

This can be resolved in a followup PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I am thinking about introducing random panic including force-kill. Let me handle this in the follow-up. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, you inject the failure on device (fs) after the process already terminates. Should we inject the failure (dropWrite) before we terminate(panic) the process?

For the forcibly killing case (we will support it in a followup PR), we do need to inject the failure on device (fs) after the process already terminates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discuss with @fuweid , let's support more cases in followup PRs

Sync times:
t1       t2          t3               x
FS
   f1   f2     f3               f4

if f4 < x
     f4 ~ x

if f4 > x
     t3 ~ x

Use gofailpoint

  • Set a huge value for commit interval: make sure all data after the last sync is lost
  • Set proper value for commit interval: make sure part of the data since last sync is lost
  • Set very small value for commit interval: almost no data loss

forcibly killing the process

  • same as above to support different commit interval

Add `Robustness Test` pipeline for robustness test cases.

Signed-off-by: Wei Fu <[email protected]>
Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Let's resolve comment in followup PRs.

Thanks @fuweid

@ahrtr ahrtr merged commit 95a982c into etcd-io:master Dec 8, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants