Replies: 2 comments 5 replies
-
I am not with qwen team, but this post stood out because of some recent quantization Ive been doing for a model focused specifically on European language support. do you have a quality dataset that can be used for this? And what size models are you talking about being able to run inference? Typically, you want roughly 500 samples per billion parameters just for the imatrix you would use to quantize the model as a rough rule of thumb. For training, its much, much larger (for example, the multilingual Salamandra models use 33TB of text data for 35 languages ..so about 1TB of raw text per language, in a 2/7/40b model). I figure XinJiang must have some university building such a dataset, if it is not already built. But the fact that Clause and chatGPT already have support suggests some university there has already furnished this sort of data. |
Beta Was this translation helpful? Give feedback.
-
I wonder how the llama models do with the Uyghur language? Might be worth testing out. |
Beta Was this translation helpful? Give feedback.
-
Hello, I'm in China and we are using Uyghur language in XinJiang, (we are living in the same country) and I have no powerful hardware and environment so I cannot train (or fine tune) your model, and I have no training/fine tunning LLM experience at all, Uyghur is a language using by 10 million people in XinJiang today, we need QWen to support our language because we live in the same country and Qwen is a powerful model, I can provide all the dataset of my language that I have and also I can help your guys to improve QWen if needed.
Calude sonnet 3.5 and ChatGpt4 good support for my language but we cannot use it because you know the reason.
I need to finetune/train QWen with my language at least if your guys not interested and can provide the instructions. (because QWen is based on llama may I find help with llama I think.)
but that is cool if QWen officially supports my language. I cannot wait to use QWen in my project.
thank you.
Beta Was this translation helpful? Give feedback.
All reactions