Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor into LexOrdering::collapse, LexRequirement::collapse avoid clone #14038

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 7, 2025

Which issue does this PR close?

Rationale for this change

While working to encapsulate the sort order code more, I noticed the free function collapse_lex_ordering that consumed a LexOrdering but wasn't easy to find. It also clones expressions unecessairly.

What changes are included in this PR?

  1. Move code to LexOrdering::collapse and avoid a clone
  2. Move code to LexRequirement::collapse
  3. Deprecate collapse_lex_req
  4. Improve docs

Are these changes tested?

By exisitng CI

Are there any user-facing changes?

A function is deprecated

@github-actions github-actions bot added the physical-expr Physical Expressions label Jan 7, 2025
@alamb alamb marked this pull request as draft January 7, 2025 20:01
@alamb alamb changed the title Refactor into LexOrdering::collapse, avoid clone Refactor into LexOrdering::collapse, LexRequirement::collapse avoid clone Jan 7, 2025
@@ -41,14 +41,9 @@ pub use properties::{
/// It will also filter out entries that are ordered if the next entry is;
/// for instance, `vec![floor(a) Some(ASC), a Some(ASC)]` will be collapsed to
/// `vec![a Some(ASC)]`.
#[deprecated(since = "45.0.0", note = "Use LexRequirement::collapse")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -207,19 +212,6 @@ impl IntoIterator for OrderingEquivalenceClass {
}
}

/// This function constructs a duplicate-free `LexOrdering` by filtering out
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the function being pub it appears to be in a non pub module, so is not exposed. You can see that it is not present in the docs:

https://docs.rs/datafusion/latest/datafusion/index.html?search=collapse_lex_ordering

(when I marked it deprecated then clippy complained it was dead code)

let mut output = LexOrdering::default();
for item in self {
if !output.iter().any(|req| req.expr.eq(&item.expr)) {
output.push(item);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original code this was item.clone() -- though it is only cloning an Arc so the performance benefit is likely minimal

@alamb alamb marked this pull request as ready for review January 7, 2025 20:28
@alamb
Copy link
Contributor Author

alamb commented Jan 7, 2025

I pushed a44acfd to this PR which had the content of the suggestion from @akurmustafa on alamb#26

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb. The idea makes sense to me. I've just 2 minor question.

Comment on lines 570 to 586
pub fn collapse(self) -> Self {
let mut output = Vec::<PhysicalSortRequirement>::new();
let mut exprs = IndexSet::new();
let mut reqs = vec![];
for item in self {
let PhysicalSortRequirement { expr, options: req } = item;
// new insertion
if exprs.insert(expr) {
reqs.push(req);
}
}
debug_assert_eq!(reqs.len(), exprs.len());
for (expr, req) in izip!(exprs, reqs) {
output.push(PhysicalSortRequirement::new(expr, req));
}
LexRequirement::new(output)
}
Copy link
Contributor

@berkaysynnada berkaysynnada Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn collapse(self) -> Self {
let mut output = Vec::<PhysicalSortRequirement>::new();
let mut exprs = IndexSet::new();
let mut reqs = vec![];
for item in self {
let PhysicalSortRequirement { expr, options: req } = item;
// new insertion
if exprs.insert(expr) {
reqs.push(req);
}
}
debug_assert_eq!(reqs.len(), exprs.len());
for (expr, req) in izip!(exprs, reqs) {
output.push(PhysicalSortRequirement::new(expr, req));
}
LexRequirement::new(output)
}
pub fn collapse(self) -> Self {
let mut output = LexRequirement::default();
for item in self {
if !output.iter().any(|req| req.expr.eq(&item.expr)) {
output.push(item);
}
}
output
}

Why doesn't this just work, like the other collapse() of LexOrdering?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized alamb#27 now, and makes sense. So, why do we apply the same in LexOrdering::collapse()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point -- I may have been over eager to merge in the PR from @akurmustafa -- I will think about this more carefully and either restore the original version or apply the same type of optimzization to LexOrdering::collapse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more thought I reverted this change and restored it to the original version. My rationale was that I don't (yet) have evidence that this is a bottleneck for processing and thus this seems a bit like a premature optimization

/// For example, `vec![a Some(ASC), a Some(DESC)]` collapses to `vec![a
/// Some(ASC)]`.
///
/// It will also filter out entries that are ordered if the next entry is;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't see how that happens 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked -- and I agree with your assesment so have removed this comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants