Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TopNRowNumberFuzzer for TopNRowNumber operator #12017

Open
aditi-pandit opened this issue Jan 3, 2025 · 1 comment
Open

Add TopNRowNumberFuzzer for TopNRowNumber operator #12017

aditi-pandit opened this issue Jan 3, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@aditi-pandit
Copy link
Collaborator

Description

TopNRowNumber operator is used for queries like

SELECT *, row_number() over (partition by c0 order by c1) as rn FROM tmp WHERE rn <= {some_number}

It is an optimized window operator which retains only the top "some_number" rows for each window partition. The window partitions are identified using a HashTable (vs Window operator which accumulates all the input rows in memory and then sorts them to identify partitions).

This operator is being subsequently enhanced to handle rank() and dense_rank() functions as well.

It would be nice to have a fuzzer on the lines of https://github.com/facebookincubator/velox/blob/main/velox/exec/fuzzer/RowNumberFuzzer.h for the TopNRowNumber operator to make it more robust.

@aditi-pandit aditi-pandit added the enhancement New feature or request label Jan 3, 2025
@aditi-pandit
Copy link
Collaborator Author

@kagamiori @xiaoxmeng : We can take this up if you think this is a good idea.

@aditi-pandit aditi-pandit changed the title Build TopNRowNumberFuzzer for TopNRowNumber operator Add TopNRowNumberFuzzer for TopNRowNumber operator Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants