You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On p.25 of book 1 (in the latest available online version dated back June 2023), in Section 1.5.4.2, it is stated that we often normalize each row of the TF-IDF matrix. According to the definition of TF-IDF in the book, i.e., $(TF-IDF)_{ij}$ refers to the frequency of the $i$-th term in the $j$-th document, normalizing each row corresponds to comparing (the occurrences of) all the words on the same scale.
Just wonder whether we actually want to normalize each column, instead of each row, of TF-IDF? This corresponds to comparing all the documents on the same scale, regardless of their lengths.
Also, there is some minute notation inconsistency in the following Sec. 1.5.4.3. Previous, the size of the vocabulary was denoted by $D$ (as what we do in most of the book), while here we switch to the undefined $V$.
The text was updated successfully, but these errors were encountered:
On p.25 of book 1 (in the latest available online version dated back June 2023), in Section 1.5.4.2, it is stated that we often normalize each row of the TF-IDF matrix. According to the definition of TF-IDF in the book, i.e.,$(TF-IDF)_{ij}$ refers to the frequency of the $i$ -th term in the $j$ -th document, normalizing each row corresponds to comparing (the occurrences of) all the words on the same scale.
Just wonder whether we actually want to normalize each column, instead of each row, of TF-IDF? This corresponds to comparing all the documents on the same scale, regardless of their lengths.
Also, there is some minute notation inconsistency in the following Sec. 1.5.4.3. Previous, the size of the vocabulary was denoted by$D$ (as what we do in most of the book), while here we switch to the undefined $V$ .
The text was updated successfully, but these errors were encountered: