Data preparation #130

zwx8981 · 2018-09-17T06:53:31Z

Hi, thank you for you great work. I have a question of data preparation. To be specific, if I want to use the CNN-based sequence encoder and decoder as standalone modules which can be inserted to other translation models, how should I prepare source dictionary file which can be successfully loaded by fairseq.data.Dictionary.load() method? I read the source code where I find comments in Dictionary.load() method:

    """Loads the dictionary from a text file with the format:

    ```
    <symbol0> <count0>
    <symbol1> <count1>
    ...
    ```
    """

What is the count0 means？

The text was updated successfully, but these errors were encountered:

mls1999725 · 2019-10-31T02:18:14Z

I want to know it, too

jgehring · 2020-05-04T11:10:52Z

I'm not sure which section of the code you're referring to here, but, generally speaking, the dictionary contains an index-to-symbol mapping as well as frequencies of symbols (in the form of raw counts over the respective source corpus).

jgehring closed this as completed May 4, 2020

jgehring reopened this May 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data preparation #130

Data preparation #130

zwx8981 commented Sep 17, 2018 •

edited

Loading

mls1999725 commented Oct 31, 2019

jgehring commented May 4, 2020

Data preparation #130

Data preparation #130

Comments

zwx8981 commented Sep 17, 2018 • edited Loading

mls1999725 commented Oct 31, 2019

jgehring commented May 4, 2020

zwx8981 commented Sep 17, 2018 •

edited

Loading