Lehrstuhl für Bioinformatik - Institut für Informatik - http://www.bioinf.uni-freiburg.de
In this exercise, you will implement Burrows-Wheeler Transform (BWT) and its inverse.
a) To start building the Burrows-Wheeler matrix, you need to implement a helper function that returns all the rotations of the given string.
Implement the function rotations
which takes a string and returns a list of all its rotations.
Example: (Spoiler)
>>> all_rotations = rotations("abaaba$")
>>> print(all_rotations)
['abaaba$', 'baaba$a', 'aaba$ab', 'aba$aba', 'ba$abaa', 'a$abaab', '$abaaba']
b) Build the Burrows-Wheeler matrix. Implement the function bwm
which takes a string and returns the Burrows-Wheeler matrix.
The matrix should be a list of string rotations sorted alphabetically.
Example: (Spoiler)
>>> matrix = bwm("abaaba$")
>>> print(matrix)
['$abaaba', 'a$abaab', 'aaba$ab', 'aba$aba', 'abaaba$', 'ba$abaa', 'baaba$a']
с) Implement the transformation with the Burrows-Wheeler matrix.
Implement the function bwt_with_bwm
which takes a string and returns the Burrows-Wheeler transform as a string.
Example: (Spoiler)
>>> t = bwt_with_bwm("abaaba$")
>>> print(t)
abba$aa
d) Check your understanding of the Burrows-Wheeler transform.
Implement the function transformation_to_first_colum
which takes the BW transform string t (the last column of the BW matrix) and returns the string corresponding to the matrix's first column.
Just so you know, you do not need to build the Burrows-Wheeler matrix to do this.
Example: (Spoiler)
>>> f = transformation_to_first_colum("abba$aa")
>>> print(f)
$aaaabb
e) Implement a helper function transformation_to_dict_occurrences
.
This function takes the string transformation (t) and returns the dictionary with the number of occurrences of each character in the transformation.
Example: (Spoiler)
>>> occurrences = transformation_to_dict_occurrences("abba$aa")
>>> print(occurrences)
{'a': 4, 'b': 2, '$': 1}
f) Implement a helper function transformation_b_ranking
. This function takes the string transformation (t) and returns the B-ranking of the characters in the transformation as a list.
Example: (Spoiler)
>>> ranks = transformation_b_ranking("abba$aa")
>>> print(ranks)
>>> print(occurrences)
[0, 0, 1, 1, 0, 2, 3]
{'a': 4, 'b': 2, '$': 1}
g) Implement the helper function bwm_first_column_interval_form_from_transform
. This function takes the string transformation (t) and returns
the first column of the BW matrix in the interval form
i.e. the dictionary where the keys are the characters
and the values are tuples (start, end) of the character occurrences.
Example: (Spoiler)
>>> f_column_intervals = bwm_first_column_interval_form_from_transform("abba$aa")
>>> print(f_column_intervals)
{'$': (0, 1), 'a': (1, 5), 'b': (5, 7)}
g) Implement the function reverse_bwt
.
This function takes the BW transformation string (t) and returns the original string.
Example: (Spoiler)
>>> s = reverse_bwt("abba$aa")
>>> print(s)
abaaba$