Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging when excecuting webdriver-related commands in some cases #947

Open
GorgMish opened this issue Jul 19, 2021 · 22 comments
Open

Hanging when excecuting webdriver-related commands in some cases #947

GorgMish opened this issue Jul 19, 2021 · 22 comments

Comments

@GorgMish
Copy link

GorgMish commented Jul 19, 2021

During crawling some websites, the OpenWPM will kinda freeze when reaching a webdriver-related command, e.g. running "https://www.marriott.com" in official demo.py, the script will hang in this line of custom_command.py link_count = len(webdriver.find_elements(By.TAG_NAME, "a")).
I think this could be related to the DB recording process since this issue usually happens for websites that relatively set big amounts of Cookies.

@GorgMish
Copy link
Author

GorgMish commented Jul 21, 2021

browser_param.extension_enabled = False will fix the issue temporarily. (normally setting that to False will raise an error so some other modifications are needed)
edited: of course, this would disable the whole OpenWPM itself

@vringar
Copy link
Contributor

vringar commented Jul 21, 2021

There is no DB related activity happening in the link counting command however.
https://github.com/mozilla/OpenWPM/blob/491262e9a9f1a9397abba47bc500f2495971bce4/custom_command.py#L42-L43
🤔
Is this while using the modified GetCommand, so it doesn't stop the command sequence if there are errors?

@vringar
Copy link
Contributor

vringar commented Jul 21, 2021

Also

browser_param.extension_enabled = False will fix the issue temporarily. (normally setting that to False will raise an error so some other modifications are needed)

shouldn't affect how any of this works.
This is quite curious.

@vringar
Copy link
Contributor

vringar commented Jul 21, 2021

However this command also doesn't do anything useful for larger crawls and maybe we should encourage removing it then.
It's mostly there to show off how easily one can write their own commands.

@GorgMish
Copy link
Author

Yes, there is no DB related command in the costum_command.py, but I thought maybe the process of recording in the database is going on in another thread or process and that causes the problem.

@GorgMish
Copy link
Author

This issue would happen also with non-modified GetCommand, I used this scenario as the MWE so you can reproduce it.
Not just for the webdriver-related command in LinkCountingCommand, this would also be the case for the custom commands I have written myself.

@ogasser
Copy link

ogasser commented Jul 22, 2021

I can reproduce this issue for http://www.marriott.com. The FinalizeCommand times out consistently, every time you run demo.py with GetCommand only (i.e. no LinkCounting) and only http://www.marriott.com in the target list.

Relevant log output:

browser_manager      - INFO     - BROWSER 2876282174: EXECUTING COMMAND: InitializeCommand()
browser_manager      - INFO     - BROWSER 2876282174: EXECUTING COMMAND: GetCommand(http://www.marriott.com,3)
browser_manager      - INFO     - BROWSER 2876282174: EXECUTING COMMAND: FinalizeCommand(5)
browser_manager      - INFO     - BROWSER 2876282174: Timeout while executing command, FinalizeCommand(5), killing browser manager
browser_manager      - INFO     - Finished working on CommandSequence with visit_id 5822921016699154 on browser with id 2876282174
storage_controller   - INFO     - Awaiting all tasks for visit_id 5822921016699154
sql_provider         - WARNING  - Visit with visit_id 5822921016699154 got interrupted
CommandSequence for http://www.marriott.com ran unsuccessfully
storage_controller   - INFO     - Terminating handler, because the underlying socket closed
storage_controller   - INFO     - Terminating handler, because the underlying socket closed
storage_controller   - INFO     - Received shutdown signal!

@vringar can you reproduce this?

@vringar
Copy link
Contributor

vringar commented Jul 23, 2021

I can reproduce it but don't have any idea, why this might be happening.
I'll need to investigate this further.

@vringar
Copy link
Contributor

vringar commented Jul 23, 2021

Just me brainstorming:

Is it really the webdriver becoming unresponsive and if so, how could that happen?
Like how could a website crash the webdriver?? It shouldn't have that power.
Need to check the website console logs to see if there is anything going on there.
Also get instrumentation in the web extension, so we can see if it's something there.

It's unlikely to be anything in OpenWPM, since there was no change to that part of the code base recently.
Need to check for GeckoDriver updates. Also need to check if the GeckoDriver bug that boolean5 filed got fixed by now.

@GorgMish
Copy link
Author

GorgMish commented Jul 23, 2021

Disabling openwpm.xpi will fix this issue; since openwpm.xpi is responsible for communication between DB and browser instances, I suggest looking into those parts of the project to find the problem, i.e the instrumentation parts.

@vringar
Copy link
Contributor

vringar commented Jul 23, 2021

Could either of you add another webdriver action right after the last action in the GetCommand and see if it starts hanging there instead of in the Finalize Command?

I'll likely only get to work on this next Friday so any information gathered until then would reduce the delay in getting this fixed.

@GorgMish
Copy link
Author

I already did; yes it will hang!

@vringar
Copy link
Contributor

vringar commented Jul 23, 2021

Thanks! Okay, then I'm going to see when we last updated geckodriver, see if we need to update our copy of their prefs and see if I need to file an upstream bug.

@vringar
Copy link
Contributor

vringar commented Jul 23, 2021

Nope, GeckoDriver hasn't been updated in forever. https://github.com/mozilla/OpenWPM/blame/master/environment.yaml#L11

@vringar
Copy link
Contributor

vringar commented Jul 23, 2021

Hmm, I'm very confused.
Need to set up a minimal code sample just using selenium and visiting http://www.marriott.com to see if that breaks as well.
But GorgMish, you also said that disabling the extension fixes this, so that must have something do with it?

@GorgMish
Copy link
Author

Yes, it should have.
Executing <Task pending name='Task-9' coro=<StorageController._handler() running at /home/alivm/Desktop/Intern/Proc/geographical-cookies/src/openwpm/openwpm/storage/storage_controller.py:94> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f91e92a5f70>()] created at /home/alivm/miniconda3/envs/openwpm/lib/python3.9/asyncio/base_events.py:424> created at /home/alivm/miniconda3/envs/openwpm/lib/python3.9/asyncio/streams.py:263> took 0.156 seconds
This log might be kinda related to the problem.

@vringar
Copy link
Contributor

vringar commented Jul 23, 2021

I don't think this is related to this issue, or if it is then it's related via a common root cause.
This log message only says that processing the request took longer than asyncio thinks a single async task should be running for (iirc 100ms).
But the task still successfully continued, otherwise there would be an error message from _handler in the logs.

@GorgMish
Copy link
Author

Just guessing, because this log would be prompted several times (between 1 till 10, it varies time to time) exactly before the timeout log.

@GorgMish
Copy link
Author

@vringar It doesn't need to disable the whole openwpm.xpi; just disabling js_instrument (and accordingly callstack_instrument #557 ) will fix the problem. It is clear there is a huge amount of js_instrument recorded just for this single website, so as I already mentioned, the problem is likely related to the burst of the DB IO process.

@vringar
Copy link
Contributor

vringar commented Jul 25, 2021

Could you try just disabling the callstack Instrument and see if that also resolves the problem?

The callstack Instrument messes with Firefox internals, so I'd think it's more likely to cause breakage.

@GorgMish
Copy link
Author

I disabled each of the instruments one by one and it just worked when I did it for js_instrument.

@vringar
Copy link
Contributor

vringar commented Jul 25, 2021

Thank you for doing this investigation!

@vringar vringar moved this to Candidates in Road to 1.0 Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Candidates
Development

No branches or pull requests

3 participants