Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Skip Scan

James Taylor edited this page May 16, 2013 · 14 revisions

Phoenix 1.2 uses Skip Scan for intra-row scanning which allows for significant performance improvement over Range Scan when rows are retrieved based on a given set of keys.

The Skip Scan leverages SEEK_NEXT_USING_HINT of HBase Filter. It stores information about what set of keys/ranges of keys are being searched for in each column. It then takes a key (passed to it during filter evaluation), and figures out if it's in one of the combinations or range or not. If not, it figures out to which next highest key to jump.

Input to the SkipScanFilter is a List<List<KeyRange>> where the top level list represents each column in the row key (i.e. each primary key part), and the inner list represents ORed together byte array boundaries.

Consider the following query:

SELECT * from T
WHERE ((KEY1 >='a' AND KEY1 <= 'b') OR (KEY1 > 'c' AND KEY1 <= 'e')) AND
KEY2 IN (1, 2)

List<List<KeyRange>> for SkipScanFilter for the above query would be [[[[[[a - b]], [[d - e]]]], [[1, 2]]]] where [[[[a - b]], [[d - e]]]] is the range for KEY1 and [[1, 2]] keys for KEY2. Consider this running on the following data.

Clone this wiki locally