Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spoof navigator.webdriver to false #526

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Flnch
Copy link

@Flnch Flnch commented Nov 5, 2019

What?

Regarding to #448 this PR extends the OpenWPM instrumentation extension (WebExtension) to spoof the navigator.webdriver attribute to false. This is done by overwriting window.navigator with a JavaScript proxy.

Why a JavaScript proxy?

Why not simply do a Object.defineProperty(navigator, 'webdriver', {enumerable: true, value: false})? Well, this changes the webdriver attribute but also the order of the window.navigator attributes. That is, the overwritten webdriver attribute is listed as first while normally listed later:

"permissions": {},
"doNotTrack": "unspecified",
...
"webdriver": true,
...
"storage": {}

becomes

"webdriver": false,
"permissions": {},
"doNotTrack": "unspecified",
...
"storage": {}

Thus, it would still be detectable.

How does it work?

Simply set spoof_navigator to True and the instrumentation extension registers the content script that spoofs the navigator object with a JavaScript proxy. This option defaults to False.

Other remarks

  • I did my best to comply with the project structure but I am keen to get your feedback.
  • The JavaScript proxy code is based on another Firefox extension (also GPLv3) as indicated in automation/Extension/firefox/spoof.js/index.js and LICENSE.

@englehardt
Copy link
Collaborator

englehardt commented Dec 17, 2019

This is very well done, and I apologize for taking so long to get to this review. Thanks so much for the contribution! I've just tested this on the Google login flow (which prevent browsers that have been automated from logging in), and the spoofing fools it.

Would you mind to rename the spoof_navigator config option to hide_webdriver. I think that's more descriptive and agnostic to the underlying changes necessary to hide the fact that we're using webdriver. We could also tie future protections into that same config option (which may not change the navigator).

I would also appreciate a simple test page that checks if the webdriver attribute is set with and without this change.

Aside from that, these changes look great!

@Flnch
Copy link
Author

Flnch commented Dec 18, 2019

Would you mind to rename the spoof_navigator config option to hide_webdriver. I think that's more descriptive and agnostic to the underlying changes necessary to hide the fact that we're using webdriver. We could also tie future protections into that same config option (which may not change the navigator).

Of course. Your suggested name is more appropriate. I will change this tomorrow.

I would also appreciate a simple test page that checks if the webdriver attribute is set with and without this change.

Do you mean something like this? I created this test that reads the navigator.webdriver property in three different scopes. This confirmed my assumption: The current solution does not spoof the navigator.webdriver property for an iframe created during runtime (see test page). I am already working on a fix for this problem.

@englehardt
Copy link
Collaborator

Would you mind to rename the spoof_navigator config option to hide_webdriver. I think that's more descriptive and agnostic to the underlying changes necessary to hide the fact that we're using webdriver. We could also tie future protections into that same config option (which may not change the navigator).

Of course. Your suggested name is more appropriate. I will change this tomorrow.

I would also appreciate a simple test page that checks if the webdriver attribute is set with and without this change.

Do you mean something like this? I created this test that reads the navigator.webdriver property in three different scopes. This confirmed my assumption: The current solution does not spoof the navigator.webdriver property for an iframe created during runtime (see test page). I am already working on a fix for this problem.

Yes exactly! And thanks for letting me know that it doesn't apply to dynamically created iframes. Do you know the cause? Just wondering if that's also true for the way we inject instrumentation.

As an example of how to build an automated test from your current test page you can look at our test page from when we previously masked this attribute.

@Flnch
Copy link
Author

Flnch commented Dec 19, 2019

And thanks for letting me know that it doesn't apply to dynamically created iframes. Do you know the cause? Just wondering if that's also true for the way we inject instrumentation.

The MDN page about content scripts mentions:

Note that in Firefox, content scripts won't be injected into empty iframes at document_start even if you specify that value in run_at.

I think this is the cause. Then, indeed, it would also apply to the injected instrumentation.

The creator of the user-agent-switcher extension I oriented the code at also dealt with this issue. He tracks added iframes and applies the JavaScript proxy overwrite to them. A first test shows using his solution would work as a fix for the navigator.webdriver attribute.

@englehardt
Copy link
Collaborator

The MDN page about content scripts mentions:

Note that in Firefox, content scripts won't be injected into empty iframes at document_start even if you specify that value in run_at.

I think this is the cause. Then, indeed, it would also apply to the injected instrumentation.

Thanks, I think we missed that during the move from the addon-sdk to WebExtensions. I filed #545 to investigate.

This renames `spoof_navigator` to `hide_webdriver` to be more descriptive.
@englehardt
Copy link
Collaborator

I think your approach may cause some breakage. Take a look at: https://www.reddit.com/r/videos/ with and without the spoofing enabled. When it's enabled you'll see the error message: TypeError: 'javaEnabled' called on an object that does not implement interface Navigator.. The page also fails to load any advertisements. This error is not present and advertisements are loaded when the spoofing is disabled (but webdriver is still used).

Previously, `navigator` functions such as `javaEnabled` were not proxied correctly and lead to broken pages (for example embedded YouTube videos).
@Flnch
Copy link
Author

Flnch commented Dec 20, 2019

I think your approach may cause some breakage. Take a look at: https://www.reddit.com/r/videos/ with and without the spoofing enabled. When it's enabled you'll see the error message: TypeError: 'javaEnabled' called on an object that does not implement interface Navigator.. The page also fails to load any advertisements. This error is not present and advertisements are loaded when the spoofing is disabled (but webdriver is still used).

Indeed. I pushed a fix for this issue. Here they did fix the problem but because of the comment "Workarounds for Firefox 68 and below" I did not suspect this to be still a problem.

@Flnch
Copy link
Author

Flnch commented Jan 22, 2020

So, there is a further detection method. Using

window.open('').navigator.webdriver

the original webdriver value is revealed because it accesses the new page's DOM that is not spoofed. In regular Firefox, such pop-up windows are blocked by default but for OpenWPM not.

@englehardt Is it for a specific reason OpenWPM disabled the pop-up window protection? As far as I understand the setting modification originates from OpenWPM/automation/Extension/firefox/node_modules/firefox-profile/lib/firefox_profile.js:45: 'dom.disable_open_during_load': 'false'. It is possible to re-activate the pop-up protection in configure_firefox.py.

@englehardt
Copy link
Collaborator

englehardt commented Jan 22, 2020

So, there is a further detection method. Using

window.open('').navigator.webdriver

the original webdriver value is revealed because it accesses the new page's DOM that is not spoofed. In regular Firefox, such pop-up windows are blocked by default but for OpenWPM not.

Do you know why the extension doesn't inject a content script into the newly created window? Is it simply that the script is able to access the webdriver attribute before the happens or is this related to #526 (comment)?

@englehardt Is it for a specific reason OpenWPM disabled the pop-up window protection? As far as I understand the setting modification originates from OpenWPM/automation/Extension/firefox/node_modules/firefox-profile/lib/firefox_profile.js:45: 'dom.disable_open_during_load': 'false'. It is possible to re-activate the pop-up protection in configure_firefox.py.

This wasn't changed by us. The file you've referenced is part of the firefox profile library (https://github.com/saadtazi/firefox-profile-js), which is required by the web-ext package. I don't think we actually use that package, and instead that the pref is set by Selenium. It's a frozen pref, but we can overwrite it.

I suspect the reason it's set to false is that Selenium is primarily meant for website testing where site owners may want to run through tests that continually open new windows without user interaction. That's not as important to us, and it seems more true to the actual use experience to block those popups.

So, I'm happy to set that pref back to true. Are you up for doing that in this PR?

Also, do you expect to be able to add automated tests to the test suite to check for this property? If not, I'll merge this as is and file a follow-up bug for tests.

Flnch added 2 commits January 23, 2020 08:40
We suspect this setting is set to `false` by Selenium. It is now manually overwritten such that pop-up windows are blocked by default. See discussion in openwpm#526.
Previously, in iframes the original `navigator` values could be accessed. Now, this is only possible via `window.frames[0]`/`window[0]` (which is the same). Again thanks to Alexander Schlarbs User Agent Switcher extension for the code.
@Flnch
Copy link
Author

Flnch commented Jan 23, 2020

Do you know why the extension doesn't inject a content script into the newly created window? Is it simply that the script is able to access the webdriver attribute before the happens or is this related to #526 (comment)?

I could not find out yet why this happens. It opens an about:blank page where the injection in principle should happen. But even in the console of the opened page, webdriver is not spoofed. However, if I manually open a new window (via Ctrl+N or the menu), webdriver is spoofed when checking in the console.

This wasn't changed by us. The file you've referenced is part of the firefox profile library (https://github.com/saadtazi/firefox-profile-js), which is required by the web-ext package. I don't think we actually use that package, and instead that the pref is set by Selenium. It's a frozen pref, but we can overwrite it.

I suspect the reason it's set to false is that Selenium is primarily meant for website testing where site owners may want to run through tests that continually open new windows without user interaction. That's not as important to us, and it seems more true to the actual use experience to block those popups.

So, I'm happy to set that pref back to true. Are you up for doing that in this PR?

Already done :) I hope configure_firefox.optimize_prefs is the correct place. Otherwise I can change that. In each case this fixes the window.open('') case.

Further, I migrated the iframe solution mentioned earlier. This does not prevent all access methods but does its best, see the commit description. I also updated the test page to cover all combinations I am aware off, including the window.open('') method.

Also, do you expect to be able to add automated tests to the test suite to check for this property? If not, I'll merge this as is and file a follow-up bug for tests.

That depends on how easy it is to integrate my test page. I looked into your reference how testing was done previously. JavaScript code was executed via Selenium. But for my test page the timing is relevant as earlier access to JavaScript properties makes a difference. Do you think it is easy to read the results of my test page from the HTML page (they are dynamically added in a HTML list) into the Python test? Or is there a more appropriate solution? I am not advanced in writing such tests so maybe adjusting my test page to also write the results in a JSON structure that can be accessed by Selenium in Python or so will be easier.

@Flnch
Copy link
Author

Flnch commented Mar 28, 2020

@englehardt Any news on this? Momentarily, I have not much time. But if you want to merge the current status I am fine with adding automated tests in a few weeks when I have resources to do so.

@englehardt
Copy link
Collaborator

Thanks for poking me on this!

@englehardt Any news on this? Momentarily, I have not much time. But if you want to merge the current status I am fine with adding automated tests in a few weeks when I have resources to do so.

It looks like your previous updates broke a bunch of the tests. Would you mind to check into those test failures to see whether you've introduced an unexpected bug?

Already done :) I hope configure_firefox.optimize_prefs is the correct place. Otherwise I can change that. In each case this fixes the window.open('') case.

That's fine!

That depends on how easy it is to integrate my test page. I looked into your reference how testing was done previously. JavaScript code was executed via Selenium. But for my test page the timing is relevant as earlier access to JavaScript properties makes a difference. Do you think it is easy to read the results of my test page from the HTML page (they are dynamically added in a HTML list) into the Python test? Or is there a more appropriate solution? I am not advanced in writing such tests so maybe adjusting my test page to also write the results in a JSON structure that can be accessed by Selenium in Python or so will be easier.

To make sure I understand, you'd like to be able to check for the webdriver attribute as early as possible and not introduce the lag that comes with executing JS through Selenium?

In that case your suggestion sounds reasonable. The test page can run the full set of tests and put the results of those tests somewhere that the JS we execute through Selenium can access (e.g., on the window object).

If we manage to get the current set of tests passing again, I'm happy to merge this without additional tests and add those in a follow-up PR.

This merges the current upstream to this issue branch.

Merge conflict in `automation/Extension/firefox/feature.js/index.js` fixed manually.
@Flnch
Copy link
Author

Flnch commented Apr 3, 2020

It looks like your previous updates broke a bunch of the tests. Would you mind to check into those test failures to see whether you've introduced an unexpected bug?

I merged remote upstream to see whether this caused the failed tests. Travis-CI tests do fail but it seems to be a problem with automated testing, e.g. with flake8. Local testing does not work because py.test -vv states ValueError: attempted relative import beyond top-level package, which I do not understand as the current working directory correctly is inside the test folder.

To make sure I understand, you'd like to be able to check for the webdriver attribute as early as possible and not introduce the lag that comes with executing JS through Selenium?

Exactly.

@englehardt
Copy link
Collaborator

Do you know why the extension doesn't inject a content script into the newly created window? Is it simply that the script is able to access the webdriver attribute before the happens or is this related to #526 (comment)?

I could not find out yet why this happens. It opens an about:blank page where the injection in principle should happen. But even in the console of the opened page, webdriver is not spoofed. However, if I manually open a new window (via Ctrl+N or the menu), webdriver is spoofed when checking in the console.

I wonder if this is related to the fact that you're loading about:blank. We've recently done a bunch of work to ensure that sites can't inject code into privileged about: pages. Do you see the same issue on other non-privileged pages?

It looks like your previous updates broke a bunch of the tests. Would you mind to check into those test failures to see whether you've introduced an unexpected bug?

I merged remote upstream to see whether this caused the failed tests. Travis-CI tests do fail but it seems to be a problem with automated testing, e.g. with flake8.

This means you have style issues that must be fixed. You can try running https://github.com/hhatto/autopep8 to fix these automatically, but looking at the outputs they seem pretty simple to fix by hand (just sorting issues for dependencies).

Local testing does not work because py.test -vv states ValueError: attempted relative import beyond top-level package, which I do not understand as the current working directory correctly is inside the test folder.

You're running pytest -vv from the test/?

Also just to note that you'll also need to run the dev install script (i.e., ./install-dev.sh) to run all of the tests, but that is unrelated to the error you're seeing.

@birdsarah birdsarah marked this pull request as draft June 17, 2020 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants