Handle the full-text content in other language #249

dawnyesky · 2021-01-26T02:40:18Z

The result of retrieving non English webpage is not encoded well. It returned the strings of hex digits (e.g. "中新网") instead of encoded text. Is there a way to fix it? I tried the CLI version of Mercury Parser and pass the parameter --format markdown, which resulting in correct text. But I have no idea how to add this kind of parameter in calling the mercury-parser-api. Please try the example URLs below to reproduce the problem:

The text was updated successfully, but these errors were encountered:

HenryQW · 2021-01-26T16:03:05Z

Not sure if it's due to encodind, unfortunately I do not have time to investigate this now. postlight/parser#425

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle the full-text content in other language #249

Handle the full-text content in other language #249

dawnyesky commented Jan 26, 2021

HenryQW commented Jan 26, 2021

Handle the full-text content in other language #249

Handle the full-text content in other language #249

Comments

dawnyesky commented Jan 26, 2021

HenryQW commented Jan 26, 2021