-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Collect sanitized metrics #366
Comments
Apparently I made my own version of "uuidv5" without knowing https://stackoverflow.com/questions/48075143/generate-uuid-from-name-space |
I will just chime in here from a process perspective. Some Orgs require VSAs (vendor security assessments) for all software you use and deploy, including open source software. There is typically some sort of vetting of the public repos and overall check list of things to the assessment, compliance, and governance teams to look at. This includes storing data/metrics. While, I personally do not think any secret or valuable data is being collected a bad actor could use, or something that would look horribly bad, I do know that if you store our data we require things like a SOC 2 type 2 report. Lots of big vendors that collect data will have these on their support portal you can download, others have it upon request. In the past I have grabbed SOC type 2 reports from Microsoft when analytics data collection has changed and compliance folks ask for it as an example. Office 365 100% collects usage metrics, so they provide this report, and I doubt anyone is blocking O365 installs at their org due to analytics data being collected. I am not sure if this would impact adoption of Nudge, and as a new comer to the tool I want to state it is great. I definitely want to support the development of this tool as well. I am not sure how my Org would feel about analytics data with out some sort of document provided how on it is securely stored, encrypted, what it is used for, and so forth. Even if the data is not valuable to a bad actor, these things are sorta templated into assessing how data is stored by third parties. Some orgs might need these things, others may not. Having an opt out option could really address all of it. In the end data exposure is a risk everyone wants to avoid, even if the data is benign because it makes headlines. My personal opinion is I don't see a huge deal in this proposal, but my personal opinion doesn't reflect or change anyone Org's process to vet and onboard new software. Some Orgs may simply not like this change, others may not care. So, I think an opt out feature would address both sides. Also, curious, but would any of these data points also work?
Lastly, thank you for all the work you have done. This is one of the best client side open source tools I have used. It does what it sets out to do out of the box and did not require a lot of effort. All while delivering the right balance of information, options, and of course annoyance to get the users to run software updates. |
If the technologies I use are SOCK2 compliant would that work? Maybe neither of us are suited to answer that question, but so far, I am not storing anything on my own infrastructure. |
For your question about stats, I don't think so. I initially thought about a simple google form, but I can barely get people to test Nudge, so I imagine I wouldn't get much engagement. |
I have no idea, I don't really do assessments. I am just bringing up things I have observed is all. SOC 2 certification definitely helps but a compliance report would show exactly how the data is stored and for what uses, which I guess is what the compliance folks want to ensure. Data leakage, even if benign and having no value is still bad press. |
At a glance (as a security person), this proposal doesn't sound unreasonable for the most part. The highest sensitivity is arguably the hardware ID/serial number, but the way you hash those two together makes it sufficiently unfeasible for a potential attacker to reverse/brute force. The only parameter I can see companies having an issue with is There is some argument that by 'calling home', all devices are effectively going to be revealing their source IPs, but as you've mentioned, our devices are already sending this kind of telemetry on a minute-by-minute basis to countless website and application developers, I don't personally see a problem with this, although I would argue that you're better off not storing actual IPs (perhaps only storing the country, which, as you plan on using Cloudflare, is made available in the |
For the record I'm not storing the IPs nor country origin. What Kafka receives is the cloudflare worker IP address. Potentially I could change the CF worker to do something like that, but I have zero intention to. I don't need nor want that data. Regarding the developer certificate, that's only to understand who is forking Nudge. 99% of employers are using the official signed/notarized version of Nudge, or so I think. |
Regarding an opt-out - I'd prefer to have a toggle somewhere to be able to turn this off entirely. Could this be a documented setting, but left out of sample JSON config files? So anyone blindly copy/pasting a config file won't immediately see it to turn it off. However in the wiki it could be documented on what it does, what it collects, and why its important to the future of Nudge. I'd happily leave this enabled if I'm given the option to, rather than being forced to. |
I'm not sure I follow. For example, we're not forking in any meaningful way (fork and patching/changing things), but we are building from an internal copy of the source, and then signing with our own dev cert. This is to avoid trust issues from using pre-compiled binaries, when possible. |
This depends on the Org's discretion. You are not SOC2 compliant just because the technologies you use (AWS/GCP/etc.) are compliant, you actually have to go through an audit/review period with an auditor to be SOC2 certified. So if an Org requires any vendors they work with to be SOC2 compliant then this would rule Nudge out. That said, it's very rare I've run into any orgs that absolutely 100% require everyone they work with to be SOC2 compliant. There can usually be exceptions made quite easily based on the vendor/tool/data being collected. |
Hi Erik, Thanks, first and foremost for all that you do. Nudge is definitely an incredible product, and you have every right to want to study how it's being used in the field to justify the time spent on that. I totally understand your motivation for approaching this change, and I appreciate it entirely. We're going to be reviewing the proposed architecture, and we might have some questions from our security folks, and I'll post those here (or if they take too long, another issue) for discussion. For now, we are likely to pause upgrade beyond our current version while we evaluate this, as we've got security and engineering workflows to follow on this. |
Fair enough. It will not be in v1.1.7. |
I'd like to know how many devices are used by developer certs outside of the official one. Perhaps I could hash this if it's not the one we expect so as not to expose it, but I still don't think that does much. |
I was thinking about this as well. If I make one, it wouldn't be in the JAMF json schema or the example json or plist and would be up to the people to do it manually. I would likely go so far in that if someone forks the json schema and provides it with the opt out, that I would remove it from the repo and let that author maintain it moving forward. I'd rather know now though if lots of people plan on disabling it though. If that's the case, I need to rethink my commitment to this project as well as shipping this feature. In some ways I don't think it's fair that I'm supposed to work on this hundreds of hours with very little help from the community, but the moment I do something that gives me some insight there is an uproar. I look forward to security engineers looking at this and seeing what concerns, if any, actually exist, but right now it feels like a philosophical thing more than anything else. |
Sorry to chime in so late. I know you've stepped away for a bit, but hopefully continuing the discussion is still useful. First, thank you for Nudge. Apple has dropped the ball on this for years - we used Munki's Personally, I appreciate that you took extra care to anonymize the data collection. I don't have concerns about that. In my org, the decision would probably not be up to me, but I'd argue strongly in favor of continuing to use Nudge. We stopped using I tried to brainstorm some possible solutions - here's what I came up with:
Regardless, if you're looking to collect data, I don't think there should be an 'off' switch (unless you're implementing something like ideas 2 or 3 in the above list, where you're being paid). I don't agree with the logic that anyone would want the option of disabling it, but wouldn't actually use it. Heck, we currently disable telemetry for both Apple and Microsoft products, and we pay for both. |
If this is still in works for a future version, here are my thoughts, too: If this is opt-out, I'll more than likely be impulsed to turn it off, or find ways to block its outgoing transmission, out of principal. One of the worst things about data collection is the under-handedness of it. How Debian handles popcorn during install (a concise prompt / explanation during setup) is perfect. I opt-in every machine I build because it's simple to understand what's collected and why, and I have to opt in to it. |
Thanks for the insight. At this moment my involvement in Nudge is paused. Anyone is welcome to maintain this project and should I have the urge to work on it, I will, but at this time my own needs are no longer being met with this project and it feels very one sided to me. Almost all features for the past year have had zero use at my organization. If people have ethical or principals that go against data collection, that is fine, but there is a human on this side, not a corporation, that is trying to just understand the impact of his work. To date every single feature no matter how much time I have spent on it has been met with eagerness and while I understand the nature of this proposal, this is the one feature that would benefit me in the past two years of this project being maintained for free. I personally think it's a fair ask. But I no longer feel my opinion on this project (nudge) actually matters. All that matters is how it impacts others. This is 100% my view and my view alone. If it's untrue, so be it. But that is my view and why at this time my engagement with nudge is paused. |
@erikng Just wanted to add a cent or two on this. |
Thanks for all your efforts, @erikng. |
I had planned on bringing in some metrics for Nudge 1.1.7 but that's no longer happening now as of 6fa38af
My goal here is the following:
Currently Nudge's preference file looks like this
What I had proposed was the following
if you had a JSON file it would add
configJSON
which is a calculated MD5 hash of the device's current config.if you had a MDM file it would add
configProfile
which is a calculated MD5 hash of the device's current config.if you had both a json and mdm profile, both keys would be present
if you had a signed application of Nudge (essentially everyone but local development tests)
so if you had the latest release, a JSON file and a MDM profile it would look like
Client Behavior
Upon the first run of the new nudge it would submit this information to
https://deviceconfiguration.nudgeapp.workers.dev
which is a CloudFlare workers serverless instance that looks for the Nudge application to submit data.After it submits the data successfully, Nudge will not submit new data unless any of the following had changed:
The following keys do not matter for resubmission
If Nudge detected weird serial number formats, it would assume a virtual machine and never submit data.
Server Infrastructure
Nudge Client -> POST -> CloudFlare Worker -> Validation
-> POST/Kafka Producer -> Kafka/UpStash
-> POST -> ElasticSearch/Private Docker -> Kafka Consumer -> Kibana/Grafana visualizations
As mentioned before, the submission endpoint is running on CloudFlare workers within their own security model.
The CloudFlare worker would then take the data, ensure it is in the correct format (and reject it if it isn't) and send it to a kafka topic running on upstart. From there it would be consumed into a private ElasticSearch.
The CloudFlare worker is not open source because of the following reasons:
Currently, the CloudFlare worker does not ship to ElasticSearch because I am still figuring out the most secure, least cost prohibitive model for this. I also do not want to expose this data to any potential malicious actors.
One thing to note is that I do not get access to the CloudFlare workers raw logs (not do I want them) which has the source IP of the client device. This protects both you and I from exposing data we don't intend to.
The Kafka message queue purges old messages after 7 days which is compliant with GDPR.
deviceConfiguration Dictionary design
I have tried to be thoughtful on what keys I want and why
appVersion
This is straight forward, but this just pulls the current nudge version.
bundlePath
This is also straight forward but is the install path for Nudge. For most people this will be
/Applications/Utilities/Nudge.app
.I'm mainly curious to collect this to understand if people are using other application paths as it could impact the
preinstall
andpostinstall
scripts.configJSON
This function would use the new
-print-json-config
logic and take the resulting string and convert it to a MD5 hash.By submitting this data, I could put machines into collections, while not exposing what company this collection belongs to.
configProfile
This function would use the new
-print-profile-config
logic and take the resulting string and convert it to a MD5 hash.By submitting this data, I could put machines into collections, while not exposing what company this collection belongs to.
developerCertificate
This is also straight forward, but is the signature of the nudge application. For most people this will be
Developer ID Application: Clever DevOps Co. (9GQZ7KUFR6)
.Given that this data is generally not seen as any measure of security I am collecting it to understand if/when forks are being used. Think jumpcloud patch management
deviceID
This is the key that took the most thought around.
These were the core requirements:
defaults delete ~/Library/Preferences/com.github.macadmins.Nudge
In order to do this I came up with the following idea:
Hardware UUID
of the device and store it in temporary memorySerial
of the device and store in temporary memory(?)
buttoncom.github.macadmins.Nudge
+Hardware UUID
+Serial UUID
By going through all of these steps, I can safely create a method that is immutable (except for logic board changes). More importantly though, this UUID cannot be used to reverse the serial number or hardware ID.
Opt out method
Initially I do not want to make an opt-out method because I want to achieve the following:
Potential concerns around data and my views on it
Almost every piece of software, including the software you are using to read this GitHub issue has some metrics being shipped out. If it's Safari, I'm sure Apple is sanitizing it in a beautiful way, but if it's Google, they are getting your hardware uuid, your serial, your IP address and much more information about you that you cannot opt-out.
This small bit of data was thoughtfully designed and greatly helps me continue to support this software for free, now and in the future.
While I agree that the raw values could absolutely be used against a company, by converting it to an MD5, I think the risk is less severe.
That said, after thinking about it, I can come up with some situations where perhaps this data could be used to infer things:
Because of this, I could see the following key
disableOptionalMetrics
This would essentially give me the absolute bare minimum device collection data and prevent me from understanding those types of events
Moving forward
I'm not sure how much longer I can continue to maintain Nudge by myself. I am no longer a macadmin and I continue to force myself to find time to improve Nudge. If I am to continue maintaining it, I must show my new organization the true impact of what Nudge is doing to help company's secure their devices.
When you deploy something it is very likely your management wants to track and understand the status of the deployment. For two years I have essentially been blind as to how Nudge is operated.
Other projects you may have used or your developers use do similar things:
Other tools check into APIs
I could go on and on and will point you to Facebook, Google, Apple. All of them track you and ship far more information about you than this tool ever will.
https://nordvpn.com/blog/worst-privacy-apps/
I look forward to the discussion and I'm sure it will get heated.
The text was updated successfully, but these errors were encountered: