Skip Scan

Phoenix 1.2 uses Skip Scan for intra-row scanning which allows for significant performance improvement over Range Scan when rows are retrieved based on a given set of keys.

The Skip Scan leverages SEEK_NEXT_USING_HINT of HBase Filter. It stores information about what set of keys/ranges of keys are being searched for in each column. It then takes a key (passed to it during filter evaluation), and figures out if it's in one of the combinations or range or not. If not, it figures out what the next highest key is that should be jumped to.

Input to the SkipScanFilter is a List<List<KeyRange>> where the top level list represents each column in the row key (i.e. each primary key part), and the inner list represents ORed together byte array boundaries.

Consider the following query:

SELECT * from T
WHERE ((KEY1 &gt;='a' AND KEY1 &lt;= 'b') OR (KEY1 &gt; 'c' AND KEY1 &lt;= 'e')) AND
KEY2 IN (1, 2)

List<List<KeyRange>> for SkipScanFilter for the above query would be [a - b], [d - e, [1, 2]] where a - b], [d - e is the range for KEY1 and [1, 2] keys for KEY2. Consider this running on the following data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip Scan

Clone this wiki locally