Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for additional DB connection conf parameters #1

Open
piccolbo opened this issue Jan 25, 2016 · 8 comments
Open

Support for additional DB connection conf parameters #1

piccolbo opened this issue Jan 25, 2016 · 8 comments

Comments

@piccolbo
Copy link
Member

continued from piccolbo#18 submitted by @MarcinKosinski

@piccolbo
Copy link
Member Author

The other thing is just create the URL that works. It seems like the only params that dbConnect takes are user and pwd. The rest has to go in the URL as host:port/db;key=value;key=value etc We have the URL specs. I saw you RJDBC inquiry, but also from the java examples it looks like getConnection only has three args, url, user and pwd.

@MarcinKosinski
Copy link

I think user may also specify his password and user name thru the URL like this

scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]

If we assume that all DB connection conf parameters should be specified in the URL, that I think additional parameter for src_Hive and src_SparkSQL could be provided like this:

my_db2 <- src_SparkSQL(
                       host = 'tools-1.hadoop.srv',
                       port = "10000",
                       conf = list(dbName = value1,
                                          conf2 = value2,
                                           conf3 = value3)
                       )

> conf
$dbName
[1] "default"

$conf2
[1] "true"

$conf3
[1] "false"

where such conf parameter provided as a list would be transformed to the way name=value, where for dbName the name would be removed like this:

> gsub(pattern = "dbName=",
+      replacement = "",
+      x = paste0(names(unlist(conf)), 
+                 paste0("=",unlist(conf)),
+                 collapse = ";")
+ ) -> conf
> 
> conf
[1] "default;conf2=true;conf3=false"

and then inside src_HS2 the URL might be replaced with a new form like

> host = 'tools-1.hadoop.srv'
> port = "10000"
> url = paste0("jdbc:hive2://", host, ":", port)
> url
[1] "jdbc:hive2://tools-1.hadoop.srv:10000"
> url2 = paste0("jdbc:hive2://", host, ":", port, "/", conf)
> url2
[1] "jdbc:hive2://tools-1.hadoop.srv:10000/default;conf2=true;conf3=false"

Thisof course can be workaround with passing port parameter in a longer way port = paste0(port, "/", conf) which also works, but does not look good.

@piccolbo
Copy link
Member Author

You can leave the API as is since you added ... to the signature, you can always write conf = list(...) and continue with the same logic you sketched out. Then I would manipulate this list before turning it into a string.if("dbName" in names(conf)) {db.name = conf['dbName']; conf['dbName'] = NULL}` Now you can form your URL without using potentially brittle pattern matching, same logic as you show in last code block. The other thing, it's not totally clear to me whether you are doing this in what class: hive, spark_SQL or HS2? The more general (HS2) the better, but we need to make sure it works. Other than these two points, I think we should do this. Let's work in a feature branch, say dbcon-params, so that I can check it without merging -- that's because it sounds like you are testing against hive and I can test against spark -- please correct me if there is a better way. Thanks.

@piccolbo
Copy link
Member Author

This may be related

@MarcinKosinski
Copy link

Whoah, thanks :)! I'll Have a look at this during the week

Marcin Kosinski

Dnia 18.03.2016 o godz. 22:54 Antonio Piccolboni [email protected] napisał(a):

This may be related


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@MarcinKosinski
Copy link

Looks like this is the solution. But the issue is still unresolved.
It would be great if such a configuration (that I mentioned at the beginning) could be possible, so that dplyr.spark.hive could be presented on Warsaw R Enthusiasts meeting.

It looks like we'll need to wait for the next version of spark.

@piccolbo
Copy link
Member Author

Great, I am watching the above issue so that we can react, as far as any presentations, I think it's off topic for this issue but don't let the perfect be the enemy of the good. You can always set up an ad hoc cluster without authentication. You don't have to have a demo at all costs.

@MarcinKosinski
Copy link

By the way, have you seen this

s-u/RJDBC#34 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants