Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Livy unable to get spark k8s app status correctly #4

Open
ashokkumarrathore opened this issue Oct 16, 2024 · 8 comments
Open

Livy unable to get spark k8s app status correctly #4

ashokkumarrathore opened this issue Oct 16, 2024 · 8 comments

Comments

@ashokkumarrathore
Copy link

I upgraded the Kubernetes client from 5.6.0 to 6.5.1 to address P0 vulnerabilities in dependencies and trying to run a simple job.

It submits job and job also succeeds. However, Livy marks it failed because it is not able to get the app status. The relevant log from Livy server is pasted below. I am also debugging it but let me know if you have something i can try.

24/10/16 08:17:58 ERROR SparkKubernetesApp: Error while refreshing Kubernetes state │
│ java.lang.IllegalStateException: Promise already completed. │
│ at scala.concurrent.Promise.complete(Promise.scala:53) │
│ at scala.concurrent.Promise.complete$(Promise.scala:52) │
│ at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187) │
│ at scala.concurrent.Promise.failure(Promise.scala:104) │
│ at scala.concurrent.Promise.failure$(Promise.scala:104) │
│ at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:187) │
│ at org.apache.livy.utils.SparkKubernetesApp.org$apache$livy$utils$SparkKubernetesApp$$monitorSparkKubernetesApp( │
│ SparkKubernetesApp.scala:299) │
│ at org.apache.livy.utils.SparkKubernetesApp$KubernetesAppMonitorRunnable.$anonfun$run$9(SparkKubernetesApp.scala │
│ :210) │
│ at org.apache.livy.utils.SparkKubernetesApp$KubernetesAppMonitorRunnable.$anonfun$run$9$adapted(SparkKubernetesA │
│ pp.scala:204) │
│ at scala.collection.immutable.Range.foreach(Range.scala:158) │
│ at org.apache.livy.utils.SparkKubernetesApp$KubernetesAppMonitorRunnable.run(SparkKubernetesApp.scala:204) │
│ at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) │
│ at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) │
│ at java.base/java.lang.Thread.run(Thread.java:829)

@ashokkumarrathore ashokkumarrathore changed the title Livy unable to get app status correctly Livy unable to get spark k8s app status correctly Oct 16, 2024
@askhatri
Copy link
Owner

ok this means it is failing to get AppId due to timed out.

@ashokkumarrathore
Copy link
Author

I think this is the call which fails :
withRetry(kubernetesClient.getApplications().find(_.getApplicationTag.contains(appTag)))
But not sure why would it fail if the driver is there and running fine. Need to see if there is some API behaviour change in new version.

@ashokkumarrathore
Copy link
Author

There is a bug due to which original exception is masked. Here's the original exception:
24/11/11 16:40:31 INFO SparkKubernetesApp: (Failed to get app from tag: ,io.fabric8.kubernetes.client.KubernetesClie │
│ ntException: Failure executing: GET at: https://kubernetes.default.svc.cluster.local/api/v1/pods?labelSelector=spark
│ -role%3Ddriver%2Cspark-app-tag%2Cspark-app-selector. Message: Forbidden!Configured service account doesn't have acce │
│ ss. Service account may have been revoked

The service account have permission on the namespace where job is submitted. Does livy look into all namespace for app or just the namespace to which job was submitted?
I am kind of curious why same setup works with older versions of spark/hadoop but not new one. Please let me know if you have any inputs.

@ashokkumarrathore
Copy link
Author

I think it is an issue with the implementation. In a multi tenant cluster, the service account might not have permission to all namespaces. We should look for job within the namespace to avoid this issue.
This is actually a regression. Spark jobs(on k8s) work fine if i use the build before we added spark k8s support.

@askhatri
Copy link
Owner

Thank you @ashokkumarrathore for the details. I see that you have created issue apache/incubator-livy#461. Do you have a potential fix in mind already? If so, we can try to make the related code changes together.

@ashokkumarrathore
Copy link
Author

I think there should be multiple changes.

  1. KubernetesClient : We can initialise this with default namespace but when we make a call to getApplications(), we should use the namespace it was submitted to.
  2. Currently we just initialise the KubernetesClient with livyConf. Need to see if we can override namespace after object creation, if not then we need to defer initialising it to later time.
  3. We also need to think how sharing k8s client works. If they have different namespaces, they can only be shared if configs match.

@askhatri
Copy link
Owner

Yes, this seems like a great idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants