-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P.k.g. phone home #4609
Comments
This would be great information to have. It would be good to hear how R's CRAN servers are dealing with this. They maintained their own log files forever about package downloads, but only recently decided to share aggregate statistics with the public. We obviously don't run servers, so we have to push information out rather than retain it, but that may not make much of a difference to people. |
Very tricky. One thing I can think of is prompt "Are you ok with Pkg sending anonymous usage data?" when Pkg.init() runs. I hate prompting the user, but it's hard to deal with this otherwise since neither opt-in nor opt-out works very well. Honestly the very best option is probably to switch to using our own servers, but that of course is quite a hassle (understatement!) |
Also worth noting: Hadley Wickham's CRANtastic tried to use an even more opt-in mechanism in which you had to explicitly call a function to send information to his server. Basically no one ever did this and he got no useful information out of it. |
I'm not sure what else could possibly have been expected. This is why making this opt-in is pretty much a non-starter, although a one-time opt-in is more likely to produce some data than an each-time opt-in like that. |
I really like having things hosted on GitHub. Managing |
I never thought I'd say this, but https://enterprise.github.com/ |
I think setting up a proxy would be easier, not to mention cheaper. |
@IainNZ and I have been putting google analytics on our readthedocs pages. Seems like a more noninvasive approach. |
@StefanKarpinski I'm not sure what a proxied METADATA.jl would give us; When a user installs a package we don't do anything special to METADATA.jl, right? Once METADATA.jl is cloned to the user's computer, the computer doesn't touch METADATA.jl until it updates, and vanilla git doesn't send the git server any information about what we've done on the client side when it updates, so I don't see how we would get any information related to what a user has installed that way. As long as we want to support users being able to do things like install packages from multiple sources (not just our GitHub repos, etc....) and maintain the fantasy that As far as user-experience for checking in goes, I think it's important to be 100% functional from the get-go, without any "initial setup" or anything. So I'd envision something like this: A default julia installation will send statistics back to the motherland on certain events (
This way, if someone installs Julia and wants to go from 0 to curing cancer in 20 seconds, they don't have to wade through a bunch of installation setup, they can get to computing quickly. Simultaneously, privacy-conscious users are immediately told about this in a non-invasive way, and we get fine-grained control about what exactly we want to report from user's machines. |
Will this also report the names of local repositories installed in |
We could do a check, either client-side (comparing to locally cached |
I think the best option is to give the warning and disabling instructions when a |
The concern I have about doing it on Pkg.init() is that I personally never If our Pkg.add() operations were significantly more concise and On Tue, Oct 22, 2013 at 1:57 PM, Ivar Nesje [email protected]:
|
This is shaping into a pretty reasonable plan. I think maybe just an interactive prompt the first time someone runs
Or maybe we should just print it out every time since that may sound more sinister than just showing the whole thing. |
I'm just concerned about automated scripts getting hung up. I often install
|
How about 'storing' the preference in the existence of a file, so that the automated scripts can just |
I guess I could just call pkg.phone_home(true) as well. Nothing to see here
|
Yes, I think that's a good idea. Non-existence of the file causes prompting, while the contents being "true" or "false" indicate the opt-in and opt-out states. |
Just to be clear, this isn't for 0.2, right? |
No, definitely not. |
An interactive prompt currently won't work in IJulia because of JuliaLang/IJulia.jl#42, although this should be fixable for special-purpose prompts (the difficulty was redirecting stdin in general). |
We could detect if STDOUT is a TTY and either print an error instead indicating that the user needs to run |
I like this idea the best. Something Viral and I were just talking about is being able to get install base numbers from something like this. E.g. do we really need to support OSX 10.6? How many people are using Julia on Ubuntu? It would be neat to have basic stuff like what we report from |
I agree. I think that the benefits to the community as a whole would be enormous. Just knowing what to focus on is a huge benefit. It means we can allocate our efforts more efficiently and sanely. |
My wishlist:
|
If we're going to do this, it would be nice to have some way of uniquely identifying machines and/or users. Hashing a MAC address might work. If they are set, hashing the value of |
Somewhat related: Github just made data about clones over the last two weeks available – you could scrape this data and show it on pkg.julialang.org without the privacy concerns. |
That's awesome. We've always wanted that data. Amazing fact of the moment: the serialization/deserialization performance issue is by itself one of the 10 most viewed pages in the repo. |
Note that this data seems to only go back for two weeks or so. So we've got 645 UNIQUE clones, and 6,675 UNIQUE views in a fortnight, which is pretty impressive, at least to me. ;) |
Yes, not too shabby. Of course I'm immediately greedy and want a much bigger window of data :) |
I'll start working on |
Wooah, I start pulling that data (if its API-exposed) |
:( https://twitter.com/alindeman/status/499239929604743169 no API yet |
I'll try hitting this: https://github.com/JuliaOpt/JuMP.jl/graphs/clone-activity-data and seeing what I can get |
I was just about to say: "shhhhhh: https://github.com/JuliaLang/julia/graphs/clone-activity-data" On Tue, Aug 12, 2014 at 1:10 PM, Iain Dunning [email protected]
|
Hope I don't get IP banned forever for doing that 350 times, lol |
If anyone knows how to scrape that with Requests, let me know. You need to be logged in to see to it, and its not a real API endpoint so I don't see a way/know how to use a token. |
I'm getting pretty close, as long as you don't mind giving this script your GitHub username. It logs in correctly, and I believe will allow us to get at the data we need, but unfortunately |
github helpfully supply the underlying data in JSON at https://github.com/JuliaLang/julia/graphs/clone-activity-data?_=1409921223000 so no complicated scrapping required. |
@samuelcolvin I just get accessed denied when I download that using wget. In a logged in browser it works fine, but that means that you'd have to manually download everything in the browser and copy i to somewhere for analysis. 350 manual fetches in the browser seems like something you'd use scraping software to help with. |
It must be possible to automatically authenticate as well. We do this when using the GitHub APIs from Julia. |
Not sure it's possible if you're not using api.gitgub...... I'll try and see
|
No I'm pretty sure @staticfloat s way is the only way here, the API is different and I make use of that extensively. |
ye, this doesn't work, so I'm pretty sure we have to do it the ugly way or wait for the proper api
|
@StefanKarpinski why was this closed? Incorporating some form of usage analytics in Pkg3 would be a good idea if possible. |
Sure, but I don't we need an issue for that. We can reopen if you like. |
If Pkg3 ends up being developed primarily in a separate repo we can move it to a new issue there. |
It would be nice to have some way of knowing what packages people are using and some way of estimating Julia installs. We could potentially achieve this by having
Pkg
phone home when doingPkg.update
– i.e. send a list of installed packages and system version info to a server for logging. I wouldn't want to do that in any underhanded, sneaky sort of way, but opt-in doesn't seem likely to generate much data. Any thoughts on this? Good idea, bad idea? How would we do it in a way that's transparent and not sneaky but is likely to get us a reasonable amount of representative data? Note that while we don't currently have any way of getting this information, GitHub already does since they know what users and IP addresses are doinggit pull
againstMETADATA.jl
. So in principle, this is already information users are sharing – just not with us.The text was updated successfully, but these errors were encountered: