nb-t5x

Code

Clone the repo and cd to it
Clone https://github.com/google-research/t5x inside with name t5x_repo and install in edit mode
Symlink t5x_repo/t5x to t5x in the cloned folder of this repo
Install dependencies jax for TPU and seqio (this one from repo)
Run run.sh

Lists of checkpoints can be found:

If meeting segmentation faults when writing checkpoints to the buckets, the reasone might be tensorstore version 0.1.18. As a temporal fix, try using version 0.1.14 instead. Fixed in newer versions of tensorstore. See also google-research/t5x#436 if JAX cannot see the TPUs.

Vocabs

The folder vocabs contains useful information to create SentencePiece vocabularies.

Exporting

In order to export models to the Huggingface format, a few things are needed:

Use the script https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/convert_t5x_checkpoint_to_flax.py to to convert the model to Flax. Then load it as AutoT5ForConditionalGeneration.from_pretrained(..., from_flax=True) and save it with .save_pretrained(...)
To convert the vocabulary (if custom), use the sentencepiece_extractor.py if BPE from https://github.com/huggingface/tokenizers/tree/main/bindings/python/scripts. If Unigram, follows the convert script.
Upload to a model repo.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_every_1M_steps.sh		convert_every_1M_steps.sh
convert_sentencepiece_unigram_to_hf.py		convert_sentencepiece_unigram_to_hf.py
convert_t5x_checkpoint_to_flax.py		convert_t5x_checkpoint_to_flax.py
convert_t5x_checkpoint_to_flax_mt5.py		convert_t5x_checkpoint_to_flax_mt5.py
convert_t5x_checkpoint_to_pytorch.py		convert_t5x_checkpoint_to_pytorch.py
copy_additional_files_from_bucket.py		copy_additional_files_from_bucket.py
finetune_mt5_base.gin		finetune_mt5_base.gin
finetune_t5_base_new_scandi_vocab.dev		finetune_t5_base_new_scandi_vocab.dev
finetune_t5_base_new_scandi_vocab.gin		finetune_t5_base_new_scandi_vocab.gin
finetune_t5_base_new_vocab.gin		finetune_t5_base_new_vocab.gin
finetune_t5_large_new_scandi_vocab.dev		finetune_t5_large_new_scandi_vocab.dev
finetune_t5_large_new_scandi_vocab.gin		finetune_t5_large_new_scandi_vocab.gin
finetune_t5_large_new_scandi_vocab_lm.dev		finetune_t5_large_new_scandi_vocab_lm.dev
finetune_t5_large_new_scandi_vocab_lm.gin		finetune_t5_large_new_scandi_vocab_lm.gin
pretrain_t5_1_1_base.gin		pretrain_t5_1_1_base.gin
resume_finetune_new_scandi_vocab_large.sh		resume_finetune_new_scandi_vocab_large.sh
resume_finetune_new_scandi_vocab_large_lm.sh		resume_finetune_new_scandi_vocab_large_lm.sh
run.sh		run.sh
run_finetune_new_scandi_vocab.sh		run_finetune_new_scandi_vocab.sh
run_finetune_new_scandi_vocab_large.sh		run_finetune_new_scandi_vocab_large.sh
run_finetune_new_scandi_vocab_large_lm.sh		run_finetune_new_scandi_vocab_large_lm.sh
run_finetune_new_vocab.sh		run_finetune_new_vocab.sh
run_from_t5.sh		run_from_t5.sh
run_mt5.sh		run_mt5.sh
run_pretrain.sh		run_pretrain.sh
sentencepiece_extractor.py		sentencepiece_extractor.py
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nb-t5x

Code

Vocabs

Exporting

About

Releases

Packages

Contributors 2

Languages

License

NbAiLab/nb-t5x

Folders and files

Latest commit

History

Repository files navigation

nb-t5x

Code

Vocabs

Exporting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages