Create Rust-only index construction function #9
Labels
enhancement
New feature or request
help wanted
Extra attention is needed
TGI
Related to the integration with `text-generation-inference`
We need a pure Rust function—call it
build_regex_index
—that only takes a regex string and a vocabulary or tokenizer object and (possibly) returns aHashMap
version of the current Python index.This is a step towards #1.
There are few things that need to be converted in order to do this:
What works best for returning a Python object and future Rust-only use (i.e. Rust at run-time)?
Do we need to allow both?
We eventually want tighter integration with
tokenizers
; should we start there? Also, could we get a Rust-based tokenizer from a Python-createdtransformers
tokenizer object (that usestokenizers
under the hood, of course)?Start by finding a suitable Rust crate for this functionality (e.g.
regex-automata
).reduced_vocabulary
.The text was updated successfully, but these errors were encountered: