-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ansible playbook based gatling JVM gets stuck when there is high CPU usage of Keycloak pods #960
Comments
@kami619 - can you please provide the output of the Gatling run on the console? Maybe by running the Gatling test locally and not on an EC2 instance when triggering the failures? IMHO this could give additional insights on what errors were triggered, and it would probably be the first things asked for when opening an issue upstream. |
@ahus1 It goes beyond the specified 600s in the run duration and gets stuck here.
full log:
|
Describe the bug
When we execute the Ansible playbooks and run the kcb.sh script on the EC2 load runners, once the test execution is done, the playbook should consolidate the gatling report in the last phase.
However, when this playbook is run against a Keycloak K8s cluster with a high CPU utilization, the JVM process of the Gatling application seems to be waiting indefinitely and the job only times out after the 60 min timeout configured inside the Ansible playbook.
Version
keycloak-benchmark latest main
Expected behavior
The Gatling should run and complete successfully irrespective of the Application Under Test behavior.
Actual behavior
It gets stuck in the last phase of the Gatling load test and doesn't finish independently.
How to Reproduce?
./aws_ec2.sh create eu-west-1
./benchmark.sh eu-west-1 --scenario=keycloak.scenario.authentication.AuthorizationCode --server-url=<KEYCLOAK_URL> --realm-name=realm-0 --users-per-sec=150 --ramp-up=20 --logout-percentage=100 --measurement=600 --users-per-realm=20000 --sla-error-percentage=0.001
PROJECT=<KEYCLOAK_NAMESPACE> ./kc-chaos.sh <RESULTS_DIR>
Observe if the playbook finishes and collects the report in the last phase before the timeout is triggered for the Async polling.
Anything else?
Thread Dump from the stuck JVM process
The text was updated successfully, but these errors were encountered: