You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your report. As expect, the codec.txt should be loaded with load_codec_json function in funcodec/datasets/iterable_dataset.py. The function load_jsonl_trans_int in read_text.py is not used to load codec tokens. Thanks for your modification as well.
按照提供的encoding_decoding.sh脚本,encoding阶段会生成codec.txt文件
这个文件的形式类似于:
utts_id "空格" json.dumps(codecs)
这个形式无法被read_text.py直接读取,需要改写“load_jsonl_trans_int”函数,如下
def load_jsonl_trans_int(path: Union[Path, str]) -> Dict[str, np.ndarray]: d = read_2column_text(path) retval = {} for k, v in d.items(): try: value = json.loads(v) if isinstance(value, dict): retval[k] = np.array(value["trans"], dtype=int) elif isinstance(value, list): retval[k] = np.array(value, dtype=int) else: raise TypeError except TypeError: logging.error(f'Error happened with path="{path}", id="{k}", value="{v}"') raise return retval
The text was updated successfully, but these errors were encountered: