Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom blooms on index data to speed up index data look up #7

Open
anoopsjohn opened this issue Aug 13, 2013 · 4 comments
Open

Custom blooms on index data to speed up index data look up #7

anoopsjohn opened this issue Aug 13, 2013 · 4 comments

Comments

@anoopsjohn
Copy link

Will add the details later in description

@anoopsjohn
Copy link
Author

Any one interested to work on this? Will need some enabling work in HBase as well. This should help in Scan performance.

@chrajeshbabu
Copy link
Member

Anoop, can't we use existing ROW or ROWCOL bloom filters? How this custom bloom filters help to improve performance?

@ramkrish86
Copy link

Am just seeing these updates in the JIRA. If we are really aiming in making it more public and gain more visibility we could definitely spend some solid time in this. +1 for it. I need to refresh the code before I could comment on this but we could make this more visible. One thing I was seeing is that some concerns people raise is that about the data type supported and its format while using indices which Phoenix tries to handle. How big is that gap in this Hindex? We could also take those activities up so that this soln is also does not have such gaps, (if any)?

@anoopsjohn
Copy link
Author

@chrajeshbabu
No existing row filter can not be used on index table HFiles. The rk of index data includes rk of the actual table also.
When we have a query like select * from table where c1 = ? and having 100 regions, we will do scan on index table on all 100 regions. Now if there was some blooms using which we can say clearly any data in the index region with c1=?, we can avoid those region's scan. So if out of 100 regions, only 10 regions we have c1=? data, we can save lot of time. The global index have this benefit and the issue with local index is we have to go to all index regions. Make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants