Support for additional DB connection conf parameters #1

piccolbo · 2016-01-25T16:10:17Z

continued from piccolbo#18 submitted by @MarcinKosinski

piccolbo · 2016-01-25T16:17:37Z

The other thing is just create the URL that works. It seems like the only params that dbConnect takes are user and pwd. The rest has to go in the URL as host:port/db;key=value;key=value etc We have the URL specs. I saw you RJDBC inquiry, but also from the java examples it looks like getConnection only has three args, url, user and pwd.

MarcinKosinski · 2016-01-26T10:45:04Z

I think user may also specify his password and user name thru the URL like this

scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]

If we assume that all DB connection conf parameters should be specified in the URL, that I think additional parameter for src_Hive and src_SparkSQL could be provided like this:

my_db2 <- src_SparkSQL(
                       host = 'tools-1.hadoop.srv',
                       port = "10000",
                       conf = list(dbName = value1,
                                          conf2 = value2,
                                           conf3 = value3)
                       )

> conf
$dbName
[1] "default"

$conf2
[1] "true"

$conf3
[1] "false"

where such conf parameter provided as a list would be transformed to the way name=value, where for dbName the name would be removed like this:

> gsub(pattern = "dbName=",
+      replacement = "",
+      x = paste0(names(unlist(conf)), 
+                 paste0("=",unlist(conf)),
+                 collapse = ";")
+ ) -> conf
> 
> conf
[1] "default;conf2=true;conf3=false"

and then inside src_HS2 the URL might be replaced with a new form like

> host = 'tools-1.hadoop.srv'
> port = "10000"
> url = paste0("jdbc:hive2://", host, ":", port)
> url
[1] "jdbc:hive2://tools-1.hadoop.srv:10000"
> url2 = paste0("jdbc:hive2://", host, ":", port, "/", conf)
> url2
[1] "jdbc:hive2://tools-1.hadoop.srv:10000/default;conf2=true;conf3=false"

Thisof course can be workaround with passing port parameter in a longer way port = paste0(port, "/", conf) which also works, but does not look good.

piccolbo · 2016-01-26T22:06:26Z

You can leave the API as is since you added ... to the signature, you can always write conf = list(...) and continue with the same logic you sketched out. Then I would manipulate this list before turning it into a string.if("dbName" in names(conf)) {db.name = conf['dbName']; conf['dbName'] = NULL}` Now you can form your URL without using potentially brittle pattern matching, same logic as you show in last code block. The other thing, it's not totally clear to me whether you are doing this in what class: hive, spark_SQL or HS2? The more general (HS2) the better, but we need to make sure it works. Other than these two points, I think we should do this. Let's work in a feature branch, say dbcon-params, so that I can check it without merging -- that's because it sounds like you are testing against hive and I can test against spark -- please correct me if there is a better way. Thanks.

piccolbo · 2016-03-18T21:54:04Z

This may be related

MarcinKosinski · 2016-03-20T05:30:24Z

Whoah, thanks :)! I'll Have a look at this during the week

Marcin Kosinski

Dnia 18.03.2016 o godz. 22:54 Antonio Piccolboni [email protected] napisał(a):

This may be related

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

MarcinKosinski · 2016-03-22T10:41:22Z

Looks like this is the solution. But the issue is still unresolved.
It would be great if such a configuration (that I mentioned at the beginning) could be possible, so that dplyr.spark.hive could be presented on Warsaw R Enthusiasts meeting.

It looks like we'll need to wait for the next version of spark.

piccolbo · 2016-03-22T16:42:33Z

Great, I am watching the above issue so that we can react, as far as any presentations, I think it's off topic for this issue but don't let the perfect be the enemy of the good. You can always set up an ad hoc cluster without authentication. You don't have to have a demo at all costs.

MarcinKosinski · 2016-06-08T09:18:29Z

By the way, have you seen this

s-u/RJDBC#34 (comment)

piccolbo mentioned this issue Jan 25, 2016

[Proposition] additional database connection configuration parameters piccolbo/dplyr.spark.hive#18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for additional DB connection conf parameters #1

Support for additional DB connection conf parameters #1

piccolbo commented Jan 25, 2016

piccolbo commented Jan 25, 2016

MarcinKosinski commented Jan 26, 2016

piccolbo commented Jan 26, 2016

piccolbo commented Mar 18, 2016

MarcinKosinski commented Mar 20, 2016

MarcinKosinski commented Mar 22, 2016

piccolbo commented Mar 22, 2016

MarcinKosinski commented Jun 8, 2016

Support for additional DB connection conf parameters #1

Support for additional DB connection conf parameters #1

Comments

piccolbo commented Jan 25, 2016

piccolbo commented Jan 25, 2016

MarcinKosinski commented Jan 26, 2016

piccolbo commented Jan 26, 2016

piccolbo commented Mar 18, 2016

MarcinKosinski commented Mar 20, 2016

MarcinKosinski commented Mar 22, 2016

piccolbo commented Mar 22, 2016

MarcinKosinski commented Jun 8, 2016