Fix IndexError caused by invalid token IDs in CFGGuide #1251

RohitRathore1 · 2024-11-07T06:39:29Z

It fixes issue #1232

These changes fix the IndexError caused by invalid token IDs in allowed_tokens_concat by handling eos_token_id appropriately and adjusting token handling in CFGGuide. The updates maintain backward compatibility and ensure that existing functionality continues to work as expected.

Tested on CPU:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:08<00:00,  1.76s/it]
Saturn
\{[ ]?"caption"[ ]?:[ ]?"([^"\\\x00-\x1F\x7F-\x9F]|\\["\\])*"[ ]?\}
{"caption":"Command module pilot Buzz Aldrin walks across the lunar surface behind the deployed Lunar folloteneer's Ramp. The bottom of a Life Science Branch leg lock is framed in a footprint on the lunar surface behind the left leg of Aldrin's suit. The videocamera on the fullmomteiner's chest is visible atop the open hatch. Apollo 11, Aug. #42; CC AS11-40-5924,"}

rlouf · 2024-11-08T20:39:21Z

outlines/fsm/guide.py


        valid_tokens = list(
-            self.iter_valid_token_ids(state, self.tokenizer.vocabulary.values())
+            self.iter_valid_token_ids(state, list(self.tokenizer.vocabulary.values()))


Why converting this to a list?

The reason for converting self.tokenizer.vocabulary.values() to a list when passing it to the iter_valid_token_ids method is to ensure that we're working with a concrete, indexable collection of token IDs

Fix IndexError caused by invalid token IDs in CFGGuide

668ea42

rlouf added this to the 0.1.3 milestone Nov 8, 2024

rlouf reviewed Nov 8, 2024

View reviewed changes

Merge branch 'main' into fix-invalid-token-id-indexerror

564b241

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix IndexError caused by invalid token IDs in CFGGuide #1251

Fix IndexError caused by invalid token IDs in CFGGuide #1251

RohitRathore1 commented Nov 7, 2024 •

edited

Loading

rlouf Nov 8, 2024

RohitRathore1 Nov 9, 2024

Fix IndexError caused by invalid token IDs in CFGGuide #1251

Are you sure you want to change the base?

Fix IndexError caused by invalid token IDs in CFGGuide #1251

Conversation

RohitRathore1 commented Nov 7, 2024 • edited Loading

rlouf Nov 8, 2024

Choose a reason for hiding this comment

RohitRathore1 Nov 9, 2024

Choose a reason for hiding this comment

RohitRathore1 commented Nov 7, 2024 •

edited

Loading