-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instabilities with browsers under the load (300-600 tests in parallel) #1
Comments
is the issue fixed? @AlexeyAltunin Browser version update make the things stable? |
We've discussed this issue in the mail and there was an assumption that the reason of the issue is the small size of the cluster nodes. It causes an often cluster autoscaling under the load and then it leads to the browsers freezes and failures. But it was just an assumption and we didn't check it. Maybe @AlexeyAltunin have some info. Here in Wrike we use 32 vCPU/128 Gb RAM node config and there are no such problems with the browsers. @vigneshfourkites do you experience the same issue? |
@srntqn No, we are in POC mode and try to run below 100 browsers. In future, we will scale more than 300 for sure, and precautionary measure under this issue might give us some idea in scaling the numbers. so asked this question! Thanks for the response! |
@srntqn We are running 32GB machine with 50 parallel test, containers are not destroyed properly and pods taint happening. what is the K8 version you are using? Any benchmark information do you have? currently using machine config is 8vCPU/32GB RAM. |
@vigneshfourkites did you check the logs of callisto? Are there any errors?
Sorry, there is a chance that I understand it in a wrong way. Could you please provide more details? What do you mean here?
Now we use 1.18.17 version of Kubernetes, unfortunately we have no benchmarks for this version. But there are no problems with pods creation/deletion and the latency is okay. |
@srntqn .. Yeah i saw some ERROR logs in the Callisto pods, 2021-06-05 12:12:35,603 unknown ERROR >>> {"tid": "web-2b5f388e811b46d9882d15f45f00b045"} what is the root cause for this error? post the above error, Delete/create request happened but pods are not removed/added in the cluster. |
@vigneshfourkites this particular error is related to displaying logs in Selenoid-UI, and not related to starting or stopping pods. |
It is enabled already as DEBUG. I only see above mentioned failures in Callisto pod, other than that no errors logged. Do you restrict the browser CPU and Memory utilisation internally anywhere? Seems like, CPU is at 100% constantly during the execution. @vpokotilov |
@vigneshfourkites looks like you have some problems with the cluster performance. Maybe the reason is the load produced by your tests. Did you try to decrease the number of parallel sessions and check how it will affect the performance?
We use only k8s requests/limits.
|
Hi! I have been testing Callisto starting from the last week.
Issue description: there are random containers/browsers freezes -> hanging pods , reproduced for running a lot of tests in parallel
3 types of errors:
WebDriverError: Pod does not have an IP
(not critical, happens very seldom)Fixed after increasing resources for nginx
browser pods
:Didn't find smth useful for callisto pod
Our configuration:
Spec:
We also tested Callisto for small suites (30-45) in parallel and it works fine.
Did you face the same issue or any ideas how to fix ?
Thanks in advance!
The text was updated successfully, but these errors were encountered: