Skip to content

Commit

Permalink
In response to Issue #22
Browse files Browse the repository at this point in the history
  • Loading branch information
percolator committed Sep 1, 2024
1 parent 31b4d15 commit 24940a3
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions bibook/protein/matrix.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -207,15 +207,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Estimating Probabilities $ p_i $ and $ p_{ij} $ from Alignments\n",
"### Estimating Probabilities $ q_i $ and $ p_{ij} $ from Alignments\n",
"\n",
"#### Background Frequencies ($ p_i $)\n",
"#### Background Frequencies ($ q_i $)\n",
"\n",
"The probability $ p_i $ represents the background frequency of amino acid $ i $ across a set of alignments or within a single alignment, depending on the context. It is calculated by counting the occurrence of each amino acid $ i $ in all alignments and then dividing by the total number of amino acid occurrences.\n",
"The probability $ q_i $ represents the background frequency of amino acid $ i $ across a set of alignments or within a single alignment, depending on the context. It is calculated by counting the occurrence of each amino acid $ i $ in all alignments and then dividing by the total number of amino acid occurrences.\n",
"\n",
"##### Calculation of $ p_i $:\n",
"##### Calculation of $ q_i $:\n",
"\n",
"$ p_i = \\frac{n_i}{N} $\n",
"$ q_i = \\frac{n_i}{N} $\n",
"\n",
"where $ n_i $ is the number of times amino acid $ i $ appears in the alignments, and $ N $ is the total number of amino acid residues in all alignments.\n",
"\n",
Expand Down Expand Up @@ -269,12 +269,12 @@
" \"LAMVPDPWIDD\")\n",
"]\n",
"\n",
"# Flatten all characters into a single list for p_i calculation\n",
"# Flatten all characters into a single list for q_i calculation\n",
"all_characters = [char for seq in alignments for string in seq for char in string]\n",
"\n",
"# Calculate p_i\n",
"# Calculate q_i\n",
"unique, counts = np.unique(all_characters, return_counts=True)\n",
"p_i = dict(zip(unique, counts / counts.sum()))\n",
"q_i = dict(zip(unique, counts / counts.sum()))\n",
"\n",
"# Calculate p_ij\n",
"pair_counts = {}\n",
Expand All @@ -298,7 +298,7 @@
"\n",
"p_ij = {pair: count / sum(pair_counts.values()) for pair, count in pair_counts.items()}\n",
"\n",
"print(\"Background Frequencies (p_i):\", p_i)\n",
"print(\"Background Frequencies (q_i):\", q_i)\n",
"print(\"Joint Probabilities (p_ij):\", p_ij)\n"
]
},
Expand Down

0 comments on commit 24940a3

Please sign in to comment.