-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Livy unable to get spark k8s app status correctly #4
Comments
ok this means it is failing to get AppId due to timed out. |
I think this is the call which fails : |
There is a bug due to which original exception is masked. Here's the original exception: The service account have permission on the namespace where job is submitted. Does livy look into all namespace for app or just the namespace to which job was submitted? |
I think it is an issue with the implementation. In a multi tenant cluster, the service account might not have permission to all namespaces. We should look for job within the namespace to avoid this issue. |
Thank you @ashokkumarrathore for the details. I see that you have created issue apache/incubator-livy#461. Do you have a potential fix in mind already? If so, we can try to make the related code changes together. |
I think there should be multiple changes.
|
Yes, this seems like a great idea. |
I upgraded the Kubernetes client from 5.6.0 to 6.5.1 to address P0 vulnerabilities in dependencies and trying to run a simple job.
It submits job and job also succeeds. However, Livy marks it failed because it is not able to get the app status. The relevant log from Livy server is pasted below. I am also debugging it but let me know if you have something i can try.
24/10/16 08:17:58 ERROR SparkKubernetesApp: Error while refreshing Kubernetes state │
│ java.lang.IllegalStateException: Promise already completed. │
│ at scala.concurrent.Promise.complete(Promise.scala:53) │
│ at scala.concurrent.Promise.complete$(Promise.scala:52) │
│ at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187) │
│ at scala.concurrent.Promise.failure(Promise.scala:104) │
│ at scala.concurrent.Promise.failure$(Promise.scala:104) │
│ at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:187) │
│ at org.apache.livy.utils.SparkKubernetesApp.org$apache$livy$utils$SparkKubernetesApp$$monitorSparkKubernetesApp( │
│ SparkKubernetesApp.scala:299) │
│ at org.apache.livy.utils.SparkKubernetesApp$KubernetesAppMonitorRunnable.$anonfun$run$9(SparkKubernetesApp.scala │
│ :210) │
│ at org.apache.livy.utils.SparkKubernetesApp$KubernetesAppMonitorRunnable.$anonfun$run$9$adapted(SparkKubernetesA │
│ pp.scala:204) │
│ at scala.collection.immutable.Range.foreach(Range.scala:158) │
│ at org.apache.livy.utils.SparkKubernetesApp$KubernetesAppMonitorRunnable.run(SparkKubernetesApp.scala:204) │
│ at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) │
│ at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) │
│ at java.base/java.lang.Thread.run(Thread.java:829)
The text was updated successfully, but these errors were encountered: