best practices for using llama-train-text-from-scratch #8654
Unanswered
ericleasemorgan
asked this question in
Q&A
Replies: 2 comments
-
Hi, i want to train text from scratch like you. |
Beta Was this translation helpful? Give feedback.
0 replies
-
On Dec 31, 2024, at 12:04 AM, lbarasc ***@***.***> wrote:
Hi, i want to train text from scratch like you. do you find some tricks to do that ?
do you have some good results ? can we share some experience ? thank you
Alas, no. I have not been able to train anything from scratch, but I believe I have plenty of content -- a corpus of more than 3 billion words. I ran the llamacpp toy training script on the Shakespeare content. It ran for about a month and output somewhat useful results. I'd like to try something bigger. --Eric Morgan
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Can somebody here present me with some best practices for using llama-train-text-from-scratch?
I have a 44 MB plain text file made up of bunches o' etexts written in English. It is my training data. [1] I then submitted the following command on my 60-core Linux computer:
After running for more than a few weeks, the log file says training is about 10% complete:
train_opt_callback: iter=330644 sample=1142529/11276953 sched=0.100000 loss=2.425543 dt=00:00:09 eta=3d 14:15:08 |>
I can use llama-cli against a version of the model [2]:
And this is a truncated version of the output:
At the current rate, the modeling process will be complete by this time next year. Obviously, this won't work for me. Soon I will have access to a GPU, and I suspect processing will speed up.
That said, what can I do to best use llama-train-text-from-scratch? For example, maybe I ought to delimit all of the sentences in my training data with
characters? Maybe I could make sure my sample data has line feeds equal to ASCII character 13? Besides these formatting options, what are some of the ways I can: 1) optimize the model creation process, and 2) optimize the model itself?Finally, even if I do all of this modeling, I don't expect results similar to other open source models, but I'd like to understand the process so I might model smaller things somewhat successfully.
[1] training data - https://distantreader.org/tmp/training/alex.txt
[2] model - https://distantreader.org/tmp/training/alex.gguf
--
Eric Morgan [email protected]
Navari Family Center for Digital Scholarship
University of Notre Dame
Beta Was this translation helpful? Give feedback.
All reactions