-
Notifications
You must be signed in to change notification settings - Fork 0
General Description of DAware
Christoph Diegelmann edited this page May 16, 2017
·
9 revisions
Purpose of this page is to give you an overview on how Linear Time Suffix Sorting By Depth Awareness works.
- F for suffixes which are sorted until their next smaller (finished / end is flagged)
- S for suffixes i smaller than suffix i + depth (need to be sorted)
- L for suffixes i larger than suffix i + depth
- T for suffixes with tandem repeats (lead to their own outer group)
- U for suffixes whose next char** has not been tested yet (unkown)
Initially all groups are by definition type U
For each group:
Partition into type L, T and S by their next char**
Inducing the type T part yields only additional type L and S parts
Flag type L and possibly unique S parts to make them type F
Rename*
Loop starting at the right most S part:
If type F:
Break out of the loop if it is the left most
Skip to the next lower S part
If type S:
Sort into groups ONLY by their next char** [This is the O(n * log(n)) part]
Flag all unique groups as type F as they are now sorted until $
For the rest:
Rename*
Partition each group into type L, T and S by their next char**
Inducing the type T part yields only additional type L and S parts
Flag type L and possibly unique S parts to make them type F
Rename*
Sort each group from left to right into their final order [This is also O(n * log(n))]
- We can savely flag the ends if we use improved two stage because it has a minimum reduction rate of 2 giving us a free bit.
- O(n * log(n)) is due to the use of introsort rather than a linear time integer sort.
- Let b be the size of a group, a be the suffixes bigger than the current, k the alphabet size and s the depth of a group:
- If we combine radix sort [O(b * log(a) / log(b))] and introsort [O(b * log(b))] we can get a worst case of O(n * sqrt(log(n)).
- Inducing mean scan the L/S and check if an element of the current group is before it. It is similar to induce copying but the order isn't final. It's just used to group repeats by their length.