Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread blocking #37

Closed
dddenis opened this issue May 18, 2023 · 2 comments · Fixed by #38
Closed

Thread blocking #37

dddenis opened this issue May 18, 2023 · 2 comments · Fixed by #38

Comments

@dddenis
Copy link
Contributor

dddenis commented May 18, 2023

Hey,

I've been experimenting with the app and encountered the following issue:

Although the app is mostly async, there are several blocking calls:

match termination_rx.recv_timeout(std::time::Duration::from_millis(cfg.collect_interval)) {

match termination_rx.recv_timeout(Duration::from_millis(sleep_time)) {

match watch_receiver.recv_timeout(Duration::from_millis(sleep_time)) {

When execution reaches any of these calls, it will block the thread.


Let's take a look at the first example (inside run_metadata_collector function):

  1. This stage is reached after every iteration of the metadata collector.
  2. The thread is being blocked for the duration of cfg.collect_interval, which is 15s be default.
  3. run_metadata_collector is joined with the read_file async branch:
    let res = tokio::try_join!(
    webserver(&config),
    logreader::read_file(&config.log_file, &metric_obj, config.sleep_time, &terminate_rx),
    collectors::run_metadata_collector(&config, &metric_obj, &terminate_rx)
    );
  4. Now the read_file is unable to continue, until the thread is unblocked by run_metadata_collector.
  5. This results in logs being processed by read_file in short bursts every 15s and not as soon as they arrive.

The effect this has on the app's behavior when hasura generates lots of logs depends on where the logs are redirected to:

  • normal file - the metrics adapter can't keep up and reports the logs metrics with delay, giving incorrect representation of the system state.
  • named pipe - as soon as the pipe's buffer is filled, it blocks the writer process (hasura) until the reader process (metrics adapter) frees the buffer by reading more lines.

The named pipe case is the most troublesome, as it results in hasura requests being throttled. I think it is the cause of this issue: #31.


Reproduction case:

  1. Clone the repo.
  2. Start docker compose.
  3. Go to the Data tab in the hasura console (http://localhost:8080/console/data/)
  4. Create an empty table (e.g. table test with single column id).
  5. Go to the API tab in the hasura console (http://localhost:8080/console/api/api-explorer).
  6. Prepare a simple request to the table created above in the GraphiQL, e.g.:
query MyQuery {
  test {
    id
  }
}
  1. Open the browser network tab and start constantly sending requests with Ctrl+Enter.
  2. After a short while the requests will stop resolving in the network tab as the named pipe buffer is filled with unprocessed logs due to metric adapter's thread being blocked.
  3. At this point the whole hasura console will stop working correctly and will get back to normal only when the metric adapter catches up with the accumulated logs.
@dddenis
Copy link
Contributor Author

dddenis commented May 18, 2023

@afitzek I've created a PR with the proposed fix, please take a look when you have time: #38

@afitzek
Copy link
Owner

afitzek commented May 21, 2023

@dddenis Thank you very much for the very detailed explanation and the nice PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants