Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Selenium 3.4+, FF 52 ESR, and add support for Python 3.4+. #152

Merged
merged 80 commits into from
Oct 9, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
a24da92
Remove mitmproxy; replace pyamf with mini-amf.
zackw Mar 9, 2017
eb1def5
Ensure PIL is available during automated tests on Travis.
zackw Mar 9, 2017
1c5d935
Apply python-modernize + some hand tidy-ups.
zackw Mar 9, 2017
861052c
Allow blank lines in requirements.txt
zackw Mar 9, 2017
5c131a7
Automated tests on 2.7 are green, so activate 3.x
zackw Mar 9, 2017
a5b76e7
Remove uses of the 'commands' module.
zackw Mar 9, 2017
fc9a7ff
urlparse -> six.moves.urllib.parse
zackw Mar 9, 2017
43dc955
One more place where we need to use six.moves.urllib
zackw Mar 9, 2017
9787c16
Fix bytes/unicode confusion around the internal client/server protocol
zackw Mar 9, 2017
ae722a9
Add missing paren
zackw Mar 9, 2017
4bd069f
Must also use byte strings in recieve_msg
zackw Mar 9, 2017
76f07f0
Bump Selenium requirement to 3.3.0 or later.
zackw Mar 9, 2017
a19aa8d
Tidy-ups:
zackw Mar 12, 2017
3a800c5
Install geckodriver and ensure it is findable. Bump to Firefox 52.0esr.
zackw Mar 10, 2017
4408d48
Delete less
zackw Mar 12, 2017
2ba17e0
Remove more dependencies on firefox-bin.
zackw Mar 12, 2017
5b702a9
Extension updates.
zackw Mar 12, 2017
414e1ac
Handle geckodriver copying the target profile.
zackw Mar 13, 2017
78aaffd
Fix more failures induced by the new Selenium.
zackw Mar 15, 2017
7aa04af
foo.next() -> next(foo) for py3 compat
zackw Mar 15, 2017
279de8f
Further workaround for Python 3 + Selenium 3 + FIFOs.
zackw Mar 16, 2017
6eb51b7
Fix a few more places where 'str' is used sloppily.
zackw Mar 16, 2017
19e037e
Print a complete traceback when reading an LSO fails.
zackw Mar 16, 2017
d954c71
Fix thinko in test_storage_vectors.py.
zackw Mar 16, 2017
bcb3eda
Another str -> binary_type adjustment.
zackw Mar 16, 2017
c12399f
Fix yet more bytes/str/unicode confusion
zackw Mar 16, 2017
3d65cff
Work around Selenium bug #3670.
zackw Mar 16, 2017
7337377
Update expectations for Firefox 52.
zackw Mar 16, 2017
2ba308a
Whack some more byte/unicode confusion moles.
zackw Mar 16, 2017
84050ba
If Flash is active, set it to run automatically.
zackw Mar 16, 2017
719fc66
Overhaul Flash LSO parsing.
zackw Mar 17, 2017
4b600b2
Typo fix multiprocess -> multiprocessing
zackw Jul 3, 2017
6e9db85
Merge branch 'python3' of git://github.com/zackw/OpenWPM into ff52
englehardt Oct 4, 2017
214354a
Removing proxy code from tree
englehardt Jul 26, 2017
3003844
PEP8 Fixes
englehardt Jul 28, 2017
9d0dbb0
PEP8 fixes
englehardt Jul 28, 2017
53b8940
PEP8 fixes
englehardt Jul 28, 2017
1db64ed
Process killing now works correctly with Selenium 3
englehardt Jul 28, 2017
054a0e8
PEP8 fixes + Custom FirefoxDriver not needed with Selenium >= 3.4
englehardt Jul 28, 2017
d3f0745
Updating Firefox ESR and Selenium to newest versions
englehardt Jul 28, 2017
9dd05ec
PEP8 Fixes
englehardt Jul 28, 2017
addb3f7
Updating firefox XPI
englehardt Jul 28, 2017
22e642d
PEP8 fixes
englehardt Jul 28, 2017
11523e5
Bugfix for extension-only testing. TODO: fix testing with Selenium
englehardt Jul 28, 2017
402fe02
PEP8 Fixes
englehardt Oct 4, 2017
dd01aef
Fixing selenium portion of manual test
englehardt Aug 2, 2017
ae88f38
PEP8 Fixes
englehardt Aug 2, 2017
5e4e729
PEP8 Fixes and cleanup
englehardt Aug 2, 2017
e47b80a
PEP8 Fixes
englehardt Aug 2, 2017
9e4fc76
PEP8 Fixes
englehardt Aug 2, 2017
4145b43
PEP8 Fixes
englehardt Aug 7, 2017
9c7b937
PEP8 Fixes
englehardt Aug 7, 2017
0d9e88f
Updating webdriver extensions to newest version
englehardt Aug 8, 2017
19333f4
Fixing HTTP POST parsing error
englehardt Aug 8, 2017
91b6cb9
Bugfix: Better process management
englehardt Aug 11, 2017
8e81f60
Bugfix: Refrencing geckodriver in an exception which only occurs when
englehardt Aug 15, 2017
ac1128c
Adding new webdriver extensions
englehardt Aug 16, 2017
c2e0a5f
Importing commonly used libraries into manual test
englehardt Aug 16, 2017
fc54978
PEP8 Fixes
englehardt Aug 18, 2017
5e05e53
Add support for setting arbitrary browser preferences.
englehardt Aug 18, 2017
3329847
Keep the DataAggregator from crashing on missing data
englehardt Aug 22, 2017
97c29fc
PEP8 Fixes
englehardt Aug 23, 2017
89162ff
PEP8 Fixes
englehardt Aug 23, 2017
9c1dddd
Tweaks to improve performance
englehardt Aug 23, 2017
fcd3c54
Remove proxy argument check
englehardt Oct 4, 2017
fae8b8d
Removing battery API instrumentation
englehardt Oct 5, 2017
a372691
Patching Selenium to support WebExtensions
englehardt Oct 7, 2017
c6f2bfc
Upgrading uBlock Origin to newest version
englehardt Oct 7, 2017
ae81ec0
Updating HTTPS Everywhere to the newest version
englehardt Oct 7, 2017
3b54fa2
Upgrading disconnect to the newest version
englehardt Oct 7, 2017
d809b8c
Updating Ghostery
englehardt Oct 7, 2017
7eee1d2
PEP8 Fixes
englehardt Oct 7, 2017
bb72000
Bump Firefox version to 52.4.0esr
englehardt Oct 7, 2017
0770b3a
Removing unused pref for DNT setting
englehardt Oct 7, 2017
6895938
Removing code that strips webdriver self-identification
englehardt Oct 8, 2017
99d5201
Updating browser prefs for FF52
englehardt Oct 8, 2017
dba6c69
Don't fallback to system firefox/geckodriver.
englehardt Oct 8, 2017
1bf8d79
Updating README to reflect new and removed features
englehardt Oct 8, 2017
d59f81c
Revert to using the multiprocess library (instead of multiprocessing)
englehardt Oct 8, 2017
65d64fb
Merge remote-tracking branch 'origin' into ff52
englehardt Oct 9, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@ sudo: required
language: python
os: linux
dist: trusty
python:
- "2.7"
- "3.4"
- "3.5"
- "3.6"
env:
# See, https://docs.travis-ci.com/user/speeding-up-the-build/
# We need a balanced distribution of the tests
Expand All @@ -21,6 +26,7 @@ before_install:
install:
- echo "y" | ./install.sh
- pip install -r requirements.txt
- pip install pillow
before_script:
- cd test
script:
Expand Down
20 changes: 20 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,23 @@
v0.9.0 - under development
======

Changes:
* The `automation` library can now be used with Python 3.4 or later,
as well as Python 2.7.
* Bump to Firefox 52 ESR, Selenium 3.4.0+, and geckodriver 0.15.0.
* geckodriver is required for Selenium 3+. `install.sh` will download
and install it.
* geckodriver 0.16.0+ does not support Firefox 52 or lower, so we are
stuck with 0.15.0 (and any bugs it may have) until the next ESR release.
* These versions of geckodriver and Selenium require Firefox 48+.
* MITMProxy support has been removed. Use `http_instrument` instead.
* Bundled Firefox privacy extensions have been updated.
* AdBlock Plus support has been removed.
* uBlock Origin and Disconnect added.
* Ghostery has been updated.
* Extensions built using the WebExtensions API are now supported. Our
extension still uses the add-on sdk.

v0.8.0 - 2017-10-09
======

Expand Down
97 changes: 19 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
OpenWPM [![Build Status](https://travis-ci.org/citp/OpenWPM.svg?branch=master)](https://travis-ci.org/citp/OpenWPM)
=======

OpenWPM is a web privacy measurement framework which makes it easy to collect
data for privacy studies on a scale of thousands to millions of site. OpenWPM
is built on top of Firefox, with automation provided by Selenium. It includes
several hooks for data collection, including a proxy, a Firefox extension, and
access to Flash cookies. Check out the instrumentation section below for more
details.
OpenWPM is a web privacy measurement framework which makes it easy to
collect data for privacy studies on a scale of thousands to millions
of site. OpenWPM is built on top of Firefox, with automation provided
by Selenium. It includes several hooks for data collection. Check out
the instrumentation section below for more details.

Installation
------------
Expand Down Expand Up @@ -128,42 +127,6 @@ for their measurement data (see
* Automatically saved when the platform closes or crashes by specifying
`browser_params['profile_archive_dir']`.
* Save on-demand with the `CommandSequence::dump_profile` command.
* **DEPRECATED** HTTP Request and Response Headers via mitmproxy
* This will be removed in future releases
* Set `browser_params['proxy'] = True`
* Data is saved to the `http_requests_proxy` and `http_responses_proxy`
tables.
* Saves both HTTP and HTTPS request and response headers
* Several drawbacks:
* Cached requests and responses are missed entirely (See #71)
* Some HTTPS connections fail with certificate warnings (See #53)
* The mitmproxy version used (v0.13) is a few releases behind the
current mitmproxy library and will likely continue to have more
issues unless updated.
* Has significantly less context available around a request/response
than is available from within the browser.
* **DEPRECATED** Javascript Response Bodies via mitmproxy
* This will be removed in future releases
* Set `browser_params['save_javascript_proxy'] = True`
* Saves javascript response bodies to a LevelDB database de-duplicated by
the murmurhash3 of the content. `content_hash` in `http_response_proxy`
keys into this content database.
* NOTE: In addition to the other drawbacks of proxy-based measurements,
content must be decoded before saving and not all current encodings are
supported. In particular, brotli (`br`) is not supported.
* **DEPRECATED** HTTP Request and Response Cookies via mitmproxy
* This will be removed in future releases
* Derived post-crawl from proxy-based HTTP instrumentation
* To enable: call
`python automation/utilities/build_cookie_table.py <sqlite_database>`.
* Data is saved to the `http_request_cookies_proxy` and
`http_response_cookies_proxy` tables.
* Several drawbacks:
* Will not detect cookies set via Javascript, but will still record
when those cookies are sent with requests.
* Cookie parsing is done using a custom `Cookie.py` module. Although a
significant effort went into replicating Firefox's cookie parsing,
it may not be a faithful reproduction.

Browser and Platform Configuration
----------------------------------
Expand Down Expand Up @@ -217,9 +180,6 @@ Note: Instrumentation configuration options are described in the
described in the *Browser Profile Support* section. As such, these options are
left out of this section.

* `disable_webdriver_self_id`
* Prevents Selenium from identifying itself in the DOM. See
[Issue #91](https://github.com/citp/OpenWPM/issues/91).
* `bot_mitigation`
* Performs some actions to prevent the platform from being detected as a bot.
* Note, these aren't comprehensive and automated interaction with the site
Expand All @@ -244,33 +204,22 @@ left out of this section.
visited as a first party.
* `donottrack`
* Set to `True` to enable Do Not Track in the browser.
* `disconnect`
* Set to `True` to enable Disconnect with all blocking enabled
* The filter list may be automatically updated. We recommend checking the version of the xpi [located here](https://github.com/citp/OpenWPM/tree/master/automation/DeployBrowsers/firefox_extensions), which may be outdated.
* `ghostery`
* Set to `True` to enable Ghostery with all blocking enabled
* NOTE: The Ghostery version used (including filter lists) may be outdated.
It's recommended that you update the xpi and `store.json` file (included in
the extension profile directory). These can be placed
[here](https://github.com/citp/OpenWPM/tree/master/automation/DeployBrowsers/firefox_extensions/ghostery)
* The filter list won't be automatically updated. We recommend checking the version of the xpi [located here](https://github.com/citp/OpenWPM/tree/master/automation/DeployBrowsers/firefox_extensions), which may be outdated.
* `https-everywhere`
* Set to `True` to enable HTTPS Everywhere in the browser.
* NOTE: The HTTPS Everywhere version may be outdated. It's recommended you
update the xpi
[located here](https://github.com/citp/OpenWPM/tree/master/automation/DeployBrowsers/firefox_extensions)
before crawling.
* `adblock-plus`
* Set to `True` to enable AdBlock Plus in the browser.
* The filter lists should be automatically downloaded and installed, but the
xpi, [located here](https://github.com/citp/OpenWPM/tree/master/automation/DeployBrowsers/firefox_extensions)
, might be outdated.
* NOTE: There is a known issue of AdBlock Plus not blocking all resources
on the first page visit. See
[Issue #35](https://github.com/citp/OpenWPM/issues/35)
for more information.
* **NOT SUPPORTED** ` tracking-protection`
* The filter list won't be automatically updated. We recommend checking the version of the xpi [located here](https://github.com/citp/OpenWPM/tree/master/automation/DeployBrowsers/firefox_extensions), which may be outdated.
* `ublock-origin`
* Set to `True` to enable uBlock Origin in the browser.
* The filter lists may be automatically updated. We recommend checking the version of the xpi [located here](https://github.com/citp/OpenWPM/tree/master/automation/DeployBrowsers/firefox_extensions), which may be outdated.
* `tracking-protection`
* **NOT SUPPORTED.** See [#101](https://github.com/citp/OpenWPM/issues/101).
* Set to `True` to enable Firefox's built-in
[Tracking Protection](https://developer.mozilla.org/en-US/Firefox/Privacy/Tracking_Protection).
* NOTE: This is not currently supported. See
[Issue #101](https://github.com/citp/OpenWPM/issues/101) for more
information.

Browser Profile Support
-----------------------
Expand Down Expand Up @@ -367,13 +316,13 @@ continuing the crawl). We recommend using
This utility allows manual debugging of the extension instrumentation with or
without Selenium enabled, as well as makes it easy to launch a Selenium
instance (without any instrumentation)
* `python manual_test.py` uses `jpm` to build the current extension directory
* `python -m test.manual_test` uses `jpm` to build the current extension directory
and launch a Firefox instance with it.
* `python manual_test.py --selenium` launches a Firefox Selenium instance
* `python -m test.manual_test --selenium` launches a Firefox Selenium instance
after using `jpm` to automatically rebuild `openwpm.xpi`. The script then
drops into an `ipython` shell where the webdriver instance is available
through variable `driver`.
* `python manual_test.py --selenium --no_extension` launches a Firefox Selenium
* `python -m test.manual_test --selenium --no_extension` launches a Firefox Selenium
instance with no instrumentation. The script then
drops into an `ipython` shell where the webdriver instance is available
through variable `driver`.
Expand All @@ -390,15 +339,7 @@ Once installed, execute `py.test -vv` in the test directory to run all tests.
Troubleshooting
---------------

1. `IOError: [Errno 2] No such file or directory: '../../firefox-bin/application.ini'`

This error occurs when the platform can't find a standalone Firefox binary in
the root directory of OpenWPM. The `install.sh` script will download and unzip
the appropriate version of Firefox for you. If you've run this script but still
don't have the binary installed note that the script will exit if any command
fails, so re-run the install script checking that no command fails.

2. `WebDriverException: Message: The browser appears to have exited before we could connect...`
1. `WebDriverException: Message: The browser appears to have exited before we could connect...`

This error indicates that Firefox exited during startup (or was prevented from
starting). There are many possible causes of this error:
Expand Down
Loading