-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods in error state #52
Comments
@yuriipolishchuk Hi |
@yuriipolishchuk can you please share full log from seleniferous container |
@alcounit I can share mine |
@SebastianPereiro thanks, will check |
@SebastianPereiro, @yuriipolishchuk should be fixed now, update seleniferous image version to alcounit/seleniferous:v1.0.4 |
@alcounit I'm doing the tests, so far so good - I haven't seen a single failed POD today. |
Hi @alcounit ! We ran a much bigger stress test today. And unfortunately got more of the same bugs. |
Maybe the problem is on the browser side? If so, we can simply change browser's POD restart policy from "restartPolicy: Never" to "restartPolicy: Always" and ignore it? |
@SebastianPereiro what are the timeouts for the seleniferous container? |
@alcounit I didn't change them, so they are default. |
I know how to fix this, will be a quick fix. But in your logs I see that session timed out, and delete request failed. smth wrong with browser? |
@alcounit You're right, I assume that something is wrong with our QA team tests. |
@SebastianPereiro try new seleniferous image alcounit/seleniferous:v1.0.5 |
@alcounit Thank you. I've updated the selenosis config and will run more tests soon. |
@alcounit I've just got a fresh error (looks the same). Attaching logs... |
@SebastianPereiro I don't see any error in log you provided, can you give more details where you see an error? |
@alcounit Sorry, my bad (I attached the wrong log). Doing more tests... |
Actually it wasn't the wrong log. I got a new log from the similar POD: |
@alcounit Found another strange situation during a prolonged test:
Could it be related to stuck browser containers? |
This is odd.
If pod is not deleted then you will quickly reach quota limit. Need to understand why pods not deleted after calling kubernetes api. |
@alcounit I also see these errors from time to time: |
@alcounit Looks like there is some sort of session stacking bug. We can see it by Selenosis RAM usage: |
Temporary fix - resize the Selenosis deployment to 0 pods and then upscale it back to HPA recommended value. |
@SebastianPereiro sorry for the delays. I'll try to fix the browser pod deletion issue. Will post develop image shorty so you can test it. |
@alcounit No problem. Thank you! |
Let me know if I can provide more diagnostic information. |
@SebastianPereiro can you please try this image alcounit/seleniferous:develop_v1.0.6 |
@alcounit Thank you. I tried but browsers failed to start. I got the following errors:
Selenosis logs:
|
my bad, image was built with arch arm64, try alcounit/seleniferous:develop_v1.0.6.1 |
@alcounit Many thanks! Testing... |
Ok, I ran my tests and still some pods end up in 'unfinished' state (seleniferous is in completed state & browser is still running). |
This looks odd, pod delete call failed
|
Could you please increase the timeout value and add more retries for pod deletion? |
@SebastianPereiro will do, sorry for delays |
After upgrading chrome image to version 101 (this can be a coincidence) we see that some chrome pods are in error state with
such entries in the log.
I've already tried to increase gracefulTerminationPeriod but that didn't help.
Here're the values for timeouts we use:
The text was updated successfully, but these errors were encountered: