Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Fix the shuffle hanging problem when fetch data from presto … #24301

Merged
merged 1 commit into from
Dec 27, 2024

Conversation

xiaoxmeng
Copy link
Contributor

@xiaoxmeng xiaoxmeng commented Dec 26, 2024

Presto java worker doesn't set next token in response for a get data size request
and it simply returns the buffered data from the source. The Prestissimo worker
updates 'sequence_' with ack sequence encoded by next token and uses it as data
stream offset to fetch data from the source. If the next token str is not set in
the data response handler header, then ack sequence is set to zero which resets the
'sequence_' to zero, and the rollback of 'sequence_' can cause data hanging problem
when Prestissimo worker fetch data from a Presto java worker (which can only happen
in the case that Presto java worker run by the coordinator as some metadata operation
can only execute on the coordinator) from the reset 'sequence_'.

This PR fixes the issue by avoiding updating 'sequence_' if next token is not set
in data response handler.

== NO RELEASE NOTE ==

@xiaoxmeng xiaoxmeng requested a review from a team as a code owner December 26, 2024 06:11
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Dec 26, 2024
@xiaoxmeng xiaoxmeng force-pushed the fix branch 6 times, most recently from 2507aa3 to 8e06b65 Compare December 26, 2024 18:16
@facebook-github-bot
Copy link
Collaborator

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@xiaoxmeng xiaoxmeng force-pushed the fix branch 2 times, most recently from ed590e7 to 491cd92 Compare December 26, 2024 23:30
ackSequenceOpt = atol(nextTokenStr.c_str());
} else {
VELOX_CHECK_EQ(
contentLength, 0, "next token is not set non-empty data response");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... is not set for non-empty ...

Copy link
Contributor

@tanjialiang tanjialiang Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By having this check, I assume we are not supporting data fetching from java?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support and only java won't set next token str in get data size request with empty data.

tanjialiang
tanjialiang previously approved these changes Dec 27, 2024
Copy link
Contributor

@tanjialiang tanjialiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks % for some minor

sequence_ = ackSequence;
if (ackSequenceOpt.has_value()) {
sequence_ = ackSequenceOpt.value();
}
Copy link
Contributor

@tanjialiang tanjialiang Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have an else block and perform some finish checks? Assuming we only send empty page at completion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also see empty sequence in case of get data size. Get data size and get data share the same response handler.

Copy link
Contributor Author

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang thanks!

ackSequenceOpt = atol(nextTokenStr.c_str());
} else {
VELOX_CHECK_EQ(
contentLength, 0, "next token is not set non-empty data response");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support and only java won't set next token str in get data size request with empty data.

sequence_ = ackSequence;
if (ackSequenceOpt.has_value()) {
sequence_ = ackSequenceOpt.value();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also see empty sequence in case of get data size. Get data size and get data share the same response handler.

…java (coordinator)

Presto java worker doesn't set next token in response for a get data size request
and it simply returns the buffered data from the source. The Prestissimo worker
updates 'sequence_' with ack sequence encoded by next token and uses it as data
stream offset to fetch data from the source. If the next token str is not set in
the data response handler header, then ack sequence is set to zero which resets the
'sequence_' to zero, and the rollback of 'sequence_' can cause data hanging problem
when Prestissimo worker fetch data from a Presto java worker (which can only happen
in the case that Presto java worker run by the coordinator as some metadata operation
can only execute on the coordinator) from the reset 'sequence_'.

This PR fixes the issue by avoiding updating 'sequence_' if next token is not set
in data response handler.
@xiaoxmeng xiaoxmeng merged commit be07fd9 into prestodb:master Dec 27, 2024
66 of 67 checks passed
@xiaoxmeng xiaoxmeng deleted the fix branch December 27, 2024 06:17
@aditi-pandit
Copy link
Contributor

@amitkdutta, @xiaoxmeng, @tanjialiang : The PR description says this fix is applicable from Prestissimo worker would fetch from a Java worker which could be present for some metadata operation at the co-ordinator.

Can you give an example of some SQL that does this ?

I'm aware of SystemConnectorQueries, but unsure if they need a full Java worker.

@amitkdutta
Copy link
Contributor

@aditi-pandit Here is an example:

select 
  count(*), 
  checksum(data_ds) AS data_ds, 
  checksum(data_ts) AS data_ts, 
  checksum(platform_id), 
  checksum(resource_type_id), 
  checksum(agg_level) AS agg_level 
from 
  (
    SELECT 
      ds AS data_ds, 
      ts AS data_ts, 
      platform_id, 
      resource_type_id, 
      agg_level AS agg_level 
    FROM 
      "some_table$partitions" 
    WHERE 
      ds = '2024-12-14'
  )

It was with columns like $partitions, but a bit complex query. Simple select * from "some_table$partitions" was working though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:Meta PR from Meta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants