Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于从小说中抽取出的角色语料生成可训练数据的步骤 #53

Open
tinydust18 opened this issue Nov 30, 2023 · 7 comments

Comments

@tinydust18
Copy link

您好,我对这个项目非常感兴趣,并且按照小说抽取的步骤抽取和生成了杨过这一人物形象的text语料和system prompt以及人物的jsonl文件。但是和训练代码中dataloader中加载的文件并不一样,请问应该如何从语料生成可训练数据呢?训练数据中chat_history以及embedding又该如何获取呢?

@LC1332
Copy link
Owner

LC1332 commented Nov 30, 2023

这是个好问题,最近正好在重构这部分代码 你要不到知乎给我发下你微信 https://www.zhihu.com/people/cheng-li-47

重构训练代码的大致计划https://o9z6tor1qu.feishu.cn/docx/LxTWdGnP2oQ0oUx8H0wcmyZCnrb

@tinydust18
Copy link
Author

好的,非常感谢,我在知乎上给您发私信了,希望可以一起探索这个项目。

@tinydust18
Copy link
Author

已经发过了一条知乎私信,在您回复我之前不能再发消息了,所以不能再发我的微信号了,可以麻烦您回复一下吗?非常感谢。

@LC1332
Copy link
Owner

LC1332 commented Dec 1, 2023

你是几点发的呀为啥我没看到啊。。。

@LC1332
Copy link
Owner

LC1332 commented Dec 1, 2023

要不你发邮件 [email protected]

@tinydust18
Copy link
Author

好的,我发一下邮件,谢谢。

@LC1332
Copy link
Owner

LC1332 commented Dec 7, 2023

新的数据和生成方法已经放在 https://huggingface.co/datasets/silk-road/ChatHaruhi-Expand-118K

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants