We provide a bunch of data
To clean and tokenize a parallel corpus, use
nlp_process clean_tok_para_corpus --help
To learn a subword tokenizer, use
nlp_process learn_subword --help
To apply the learned subword tokenizer, user
nlp_process apply_subword --help