Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set custom temporary directory used by FF and delete it #1093

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

ndanner-wesleyancs
Copy link

This pull request modifies OpenWPM as follows:

  • When a browser is launched, a custom temporary directory is created.
  • Before geckodriver is started, the TMPDIR environment variable is set in the environment used by geckodriver.
  • When the browser terminates, the custom temporary directory is deleted.

The proximate rationale for doing this is that when geckodriver adds the OpenWPM extension to FF, it makes a copy of the XPI file in its temporary directory. Unfortunately, it doesn't delete it. That means one copy of the XPI file for each browser that is launched is left behind in /tmp. On a big stateless crawl, that's a lot of copies. We can't find out the name of the temporary file that geckodriver creates to delete it ourselves, and we don't want to just delete all XPI files in /tmp because we don't actually know to whom the all belong. This approach seems to resolve these problems.

This patch is based on v.0.28.0.

See #1090.

instance and delete it when the browser quits.

This is a workaround for an issue with geckodriver.  When the OpenWPM
extension is installed via `WebDriver.install_addon()`, geckodriver
makes a copy of the XPI file in TMPDIR.  However, geckodriver never
deletes that file.  So on a stateless crawl, you end up with one copy of
the XPI file for each site visited.

This workaround sets TMPDIR in the environment before creating the
geckodriver service, and then deletes the directory after
`driver.quit()` returns in `BrowserManager.run()`.  We use this
indirection because we don't have access to the name of the temporary
file, and it doesn't seem safe to just delete XPI files in /tmp
willy-nilly.
- Temporary directory is created in
  `BrowserManagerHandle.launch_browser_manager`.
- Temporary directory is deleted in
  `BrowserManagerHandle.shutdown_browser`.
- Temporary directory is added to environment for
  `selenium.webdriver.firefox.service.Service` in
  `deploy_browsers.deploy_firefox()
`BrowserManagerHandle.close_browser_manager`.
`BrowserParams.tmpdir` is now `Optional[Path]`.  Value `None` means no
temporary directory has been set, `Some p` means it has been set to `p`.
When the browser manager is launched, if `tmpdir` is not `None`, try to
delete it, assuming that it is leftover from some failure that prevented
normal deletion.  Regardless, then make a new temporary directory.  Make
sure to set it to `None` when the temporary directory is deleted
during normal cleanup.
BrowserManagerHandle.close_browser_manager.
instance and delete it when the browser quits.

This is a workaround for an issue with geckodriver.  When the OpenWPM
extension is installed via `WebDriver.install_addon()`, geckodriver
makes a copy of the XPI file in TMPDIR.  However, geckodriver never
deletes that file.  So on a stateless crawl, you end up with one copy of
the XPI file for each site visited.

This workaround sets TMPDIR in the environment before creating the
geckodriver service, and then deletes the directory after
`driver.quit()` returns in `BrowserManager.run()`.  We use this
indirection because we don't have access to the name of the temporary
file, and it doesn't seem safe to just delete XPI files in /tmp
willy-nilly.
- Temporary directory is created in
  `BrowserManagerHandle.launch_browser_manager`.
- Temporary directory is deleted in
  `BrowserManagerHandle.shutdown_browser`.
- Temporary directory is added to environment for
  `selenium.webdriver.firefox.service.Service` in
  `deploy_browsers.deploy_firefox()
`BrowserManagerHandle.close_browser_manager`.
`BrowserParams.tmpdir` is now `Optional[Path]`.  Value `None` means no
temporary directory has been set, `Some p` means it has been set to `p`.
When the browser manager is launched, if `tmpdir` is not `None`, try to
delete it, assuming that it is leftover from some failure that prevented
normal deletion.  Regardless, then make a new temporary directory.  Make
sure to set it to `None` when the temporary directory is deleted
during normal cleanup.
BrowserManagerHandle.close_browser_manager.
Copy link
Contributor

@vringar vringar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,
thanks for you contribution.
Could you create a test where you assert that the original TMPDIR doesn't contain any .xpi and also address the mypy issues?
Or is this contribution done from your side and if I want those things I need to do them myself?

@ndanner-wesleyancs
Copy link
Author

I'll try to work on these in the next couple of weeks.

@ndanner-wesleyancs
Copy link
Author

I have fixed the mypy errors.

@ndanner-wesleyancs
Copy link
Author

I'm a little perplexed about your suggested test. The original TMPDIR (typically /tmp) could very well have XPI files in it, for example from some other independent process. I'm not sure why we would want to guard against that. If you meant the new TMPDIR, then since it is created by mkdttmp, it is necessarily empty, isn't it?

@vringar vringar enabled auto-merge (rebase) August 6, 2024 16:03
@vringar vringar self-requested a review August 6, 2024 16:03
@vringar vringar disabled auto-merge August 6, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants