You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
let tuple = item.split(separator: " ").map { String($0) }
That doesn't split some merges correctly, e.g. in Gemma it has ▁ह ै. In swift the split() on String splits on grapheme clusters, not code points. As-is, this will crash on the Gemma merges.
I think you want to use:
String.UnicodeScalarView.split(separator:)
to perform this split on the code points instead.
The text was updated successfully, but these errors were encountered:
The
BPETokenizer
currently has code like this:That doesn't split some merges correctly, e.g. in Gemma it has
▁ह ै
. In swift thesplit()
on String splits on grapheme clusters, not code points. As-is, this will crash on the Gemma merges.I think you want to use:
to perform this split on the code points instead.
The text was updated successfully, but these errors were encountered: