-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding HiveServer Credential Provider #210
base: master
Are you sure you want to change the base?
Conversation
The Travis CI checks failed with message: I think that the stalling is caused by the build and not by the new code. I would appreciate if I receive any guidance here. |
I created a pull-request in yarnspawner as well: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Apologies for the delayed review here. I've left a few comments on the implementation.
A few general questions:
- Is
credential_providers
the best name for this field? What terminology do other systems use? - What other credential providers may a user want us to support?
- Are
uri
andprincipal
sufficient information for all other implementations?
@@ -390,6 +394,38 @@ | |||
<scope>provided</scope> | |||
</dependency> | |||
|
|||
<dependency> | |||
<groupId>org.apache.hive</groupId> | |||
<artifactId>hive-jdbc</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a system has hive available, is this jar likely to already be on the classpath (this is common for other hadoop libraries)? If so, we usually set <scope>provided</scope>
for these types of dependencies to keep the size of our jar down and not worry about dependency mismatches. You should be able to drop all the exclusions as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true. Thanks for explaining why I should use provided
skein/model.py
Outdated
@@ -1152,6 +1152,60 @@ def from_protobuf(cls, obj): | |||
security=security) | |||
|
|||
|
|||
class CredentialProviderSpec(Specification): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just call this CredentialProvider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap
java/src/main/proto/skein.proto
Outdated
message CredentialProviderSpec { | ||
string name = 1; | ||
string uri = 2; | ||
string principal = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sufficient for any other credential provider we may want to support later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as explained in the big comment, having these 2 are not enough for all providers, and I switched to a map of configurations
import org.apache.hadoop.yarn.exceptions.YarnException; | ||
import java.io.IOException; | ||
|
||
public interface CredentialProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than abstracting this out into an interface with two implementations, I'd make a single class with a method for adding each one, and a larger method for iterating through the spec and adding credentials for each provider. In python pseudocode:
class CredentialManager:
def add_credentials_common(self, credentials):
pass
def add_credentials_hive(self, credentials, uri, principal):
pass
def add_credentials(self, credentials, spec):
self.add_credentials_common(credentials)
for cred in spec.credential_providers:
if cred.name == "hive":
self.add_credentials_hive(credentials, cred.uri, cred.principal)
elif cred.name == ...:
pass # other implementations added here
This may not be as java idiomatic, but better matches with the existing code style. I don't see a reason to abstract this out when we'll only ever support a set list of credential providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I'll refactor it this way.
for(int i = 0; i < spec.getCredentialProvidersCount(); i++) { | ||
Msg.CredentialProviderSpec d = spec.getCredentialProviders(i); | ||
if (d.getName().equals("hive")) { | ||
credentialProviders.add(new HiveCredentials(d, spec.getUser())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we do if the user requests a provider we don't support? There's two cases here:
- The name provided doesn't match any of our implementations.
- The name does match one we implement, but adding a credential fails.
In both cases I think we should error. But I don't think that should be handled here, rather I think it should be handled in the routine that manages adding credentials (described below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there will be exceptions propagated backwards in both cases in the next draft
Thank you for reviewing my code :). |
Hi, I answered the questions:
Since we are only dealing with Delegation token maybe we can rename "credential_providers" to be more specific:
I was thinking, can we have a |
Hi Jim, my second draft is ready for review. I think I covered all the things you mentioned above. |
Thanks @georgepachitariu. I'm currently on break between jobs (and without a computer), but will try to look at the changes as soon as I can (max 2 weeks from now). Apologies, thanks for being patient. |
No worries @jcrist, take your time. |
This is amazing effort at enabling good support for exhaustive services in hadoop, thanks! Any ETA on when can we expect this to be available for use? |
Hi all. I just started a new job, so a fair bit of my time is occupied ramping up on that. I expect to be able to give this a good review by end-of-week though. Thanks for your patience. |
This is very nice PR, exactly what we currently need for our platform. How I can help to have it merged ? Really looking forward to help and to have it in master. |
Hi @santosh-d3vpl3x @wundervaflja. It's very cool to see that other people have the same ideas as me. As Jim said, please be a little patient. We will work towards a solution we like. |
Hello, do you have an estimate on the completeness of this PR ? I'm really interested, and ready to help if needed. |
Hi @gboutry, nice to meet you! |
Sorry for the late reply, I was able to test your work, and indeed it works as expected, you did a really good job. (I rebased your branch on skein master, and it worked well) |
Hello Jim,
first I would like to thank you for implementing this library. We needed a way to launch Jupyter containers on our existing Hadoop and your libraries are fitting great.
In our Hadoop, the users get the data by connecting to our Hive database. Since we have a kerberised cluster, the way to connect from a yarn container is to use delegation tokens.
I implemented the part that connects to Hive and obtains a delegation token that is added to the rest of the tokens.
I used the Oozie implementation for inspiration (the interface CredentialProvider.java is also like there):
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/Hive2Credentials.java
This is my first draft. Could you please have a look at it?