-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests/robustness: init with powerfailure case #622
Conversation
7f37f07
to
926680c
Compare
ping @ahrtr |
cdfc75f
to
e60530d
Compare
strategy: | ||
matrix: | ||
os: [ubuntu-latest] | ||
runs-on: ${{ matrix.os }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strategy: | |
matrix: | |
os: [ubuntu-latest] | |
runs-on: ${{ matrix.os }} | |
runs-on: ubuntu-latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
// FIXME: gofail should support unix socket so that the test cases won't | ||
// be conflicted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the conflict?
- The existing https://github.com/etcd-io/bbolt/blob/master/tests/failpoint/db_failpoint_test.go doesn't require exporting a port for failpoint;
- We will not run the test cases under test/robustness in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run into port-already-being-used issue in etcd flakey case when that test case doesn't close port after finish.
I remove that comment since it seems confusing.
time.Sleep(time.Duration(time.Now().UnixNano()%5+1) * time.Second) | ||
t.Logf("simulate power failure") | ||
|
||
activeFailpoint(t, fpURL, "beforeSyncMetaPage", "panic") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking we should also support forcibly killing the process so that the process can exit at a random point?
This can be resolved in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I am thinking about introducing random panic including force-kill. Let me handle this in the follow-up. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test, you inject the failure on device (fs) after the process already terminates. Should we inject the failure (dropWrite) before we terminate(panic) the process?
For the forcibly killing case (we will support it in a followup PR), we do need to inject the failure on device (fs) after the process already terminates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discuss with @fuweid , let's support more cases in followup PRs
Sync times:
t1 t2 t3 x
FS
f1 f2 f3 f4
if f4 < x
f4 ~ x
if f4 > x
t3 ~ x
Use gofailpoint
- Set a huge value for
commit interval
: make sure all data after the lastsync
is lost - Set proper value for
commit interval
: make sure part of the data since lastsync
is lost - Set very small value for
commit interval
: almost no data loss
forcibly killing the process
- same as above to support different commit interval
Add `Robustness Test` pipeline for robustness test cases. Signed-off-by: Wei Fu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add
Robustness Test
pipeline for robustness test cases.REF: #568