-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added install.R. runtime.txt, and R notebook #129
added install.R. runtime.txt, and R notebook #129
Conversation
abernauer
commented
Oct 21, 2020
- Added the R notebook
- Updated the binder directory with a runtime.txt file containing the R version.
- Updated the binder directory with a install.R file with packages to install.
Thanks for opening your first pull request in this repository! Someone will review it when they have a chance. In the mean time, please be sure that you've handled the following things, to make the review process quicker and easier:
Thank you again for your contributions! 👍 |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
binder/install.R
Outdated
@@ -0,0 +1,6 @@ | |||
install.packages("mlpack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong but I think mlpack isn't in CRAN - mlpack/mlpack#2636. I guess the easiest workaround for now is to enable the R bindings in the conda package. On your local setup you manually build the R bindings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I forgot about this when writing that line. You can install r packages on the command line using R CMD INSTALL mlpack.tar.gz
. I got the bindings from a git hub archive of the R bindings from yashwant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that line should be removed. Yeah enabling the R bindings in the conda package is probably the easiest. The other work around is installing the package with what I mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess since installing the necessary packages takes a while it makes sense to include the files in the conda package. Will update the package later and post an update here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad the comments were pending.
binder/install.R
Outdated
@@ -0,0 +1,6 @@ | |||
install.packages("mlpack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I forgot about this when writing that line. You can install r packages on the command line using R CMD INSTALL mlpack.tar.gz
. I got the bindings from a git hub archive of the R bindings from yashwant.
binder/install.R
Outdated
@@ -0,0 +1,6 @@ | |||
install.packages("mlpack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that line should be removed. Yeah enabling the R bindings in the conda package is probably the easiest. The other work around is installing the package with what I mentioned above.
If unintentional, this works towards #97. 🥇 |
@birm Yes sort of unintentional. Though I had fun writing the example and would love to adapt some of the other examples for the R bindings. |
Pushed the commit addressing the comments by @zoq 😄 |
@@ -0,0 +1,933 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be really great if you added a quick header comment describing what this example does. You could use the exact same comment from the other notebooks.
Reply via ReviewNB
@@ -0,0 +1,933 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ratings-only.csv.gz
file is 62MB, so this is a fairly large example. Maybe it would be good to mention that in case a user thinks that this notebook will only take a second to run? :)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should switch to the small data set, as done for the the other notebook. We can use the same settings, but it's much faster. In movie-lens-cf-py.ipynb
I used https://lab.mlpack.org/data/MovieLens-small.zip
which is < 1MB.
@@ -0,0 +1,933 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should put this comment just before we load the data? Also, it's probably good to introduce the whole idea of the MovieLens 20M dataset first. Marcus has a nice comment describing it in the C++ notebook. (Where 20M refers to the number of ratings.)
Also some quick formatting changes might be helpful. For instance, "userId an integer id for the user" might be clearer as "userId
, an integer id representing the user".
Reply via ReviewNB
@@ -0,0 +1,933 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,933 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,933 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice---if you want to use descriptive statistics, it's probably good to add in a comment that this is mlpack support. Specifically we are using mlpack to compute some nice statistics about each dimension in our data, but here there is only one dimension (the rating itself), so we get some nice statistics on the ratings.
Reply via ReviewNB
@@ -0,0 +1,933 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above these cells it could be nice to add a comment block describing that we will train the model, that we want to do a split into a train/test set first, and we'll use mlpack tools for both of these tasks, etc. etc. Basically the reader will be scrolling through, and it's really helpful for them if we can describe why we're doing something instead of making them try and figure out why we did it. :)
Reply via ReviewNB
@@ -0,0 +1,933 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like maybe one more space is needed so the arguments to cf() align? Or maybe that is just my browser.
What does the # model
comment mean?
Reply via ReviewNB
@@ -0,0 +1,933 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,933 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't recognize a single one of these movies (that doesn't mean they aren't good 😄). I see that in the Python notebook Marcus used user 2; maybe user 2 has less obscure taste? :) (Marcus also printed all of the movies that the user had previously rated, so that the reader can see the correlation between the two. That would probably also be helpful here.)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same I don't know any of the listed movies, maybe that means I should watch more movies, also the movies are quite random, don't really see a pattern :)
Hey @abernauer, thanks for taking the time to implement this! It looks great. I left a bunch of comments through ReviewNB; hopefully they're helpful. I do have some slight (but not serious) concern about the CF model itself; I wonder if we need to tune it with different parameters so that the movies it shows are more common. Probably the |
@abernauer this is nice work, but I will have to close this PR, we did a major refactoring of this repository and I would doubt that it will be easy to solve all the merge conflicts. Therefore, if you are still interested feel free to open this PR again. |