Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opt class for positional argument handling #10508

Merged
merged 1 commit into from
Dec 13, 2024

Conversation

ericcurtin
Copy link
Contributor

@ericcurtin ericcurtin commented Nov 26, 2024

Opt class for positional argument handling
Added support for positional arguments model and prompt. Added
functionality to download via strings like:

llama-run llama3
llama-run ollama://granite-code
llama-run ollama://granite-code:8b
llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
llama-run https://example.com/some-file1.gguf
llama-run some-file2.gguf
llama-run file://some-file3.gguf

@ericcurtin ericcurtin force-pushed the new-style-run branch 8 times, most recently from 1454a52 to 39fa786 Compare November 26, 2024 23:58
@ericcurtin ericcurtin force-pushed the new-style-run branch 2 times, most recently from fa3ff4f to be9cf7c Compare November 27, 2024 15:06
@ericcurtin
Copy link
Contributor Author

@slaren @ggerganov this is ready for review. The next PR after this will download a huggingface model if the model string starts with hf:// or huggingface:// like RamaLama using the pre-existing huggingface downloader code in llama.cpp

One thing that could be better is the output from that code. huggingface-cli has a much nice progress bar, etc. (python kinda makes it easy). But one step at a time I guess 😊

examples/main/main.cpp Outdated Show resolved Hide resolved
@ericcurtin ericcurtin force-pushed the new-style-run branch 4 times, most recently from a4bbad4 to 22d31da Compare December 9, 2024 12:43
@ericcurtin
Copy link
Contributor Author

ericcurtin commented Dec 9, 2024

This is good for re-review @slaren I can't figure out how to call this kinda code correctly:

    int huggingface_dl(const std::string & model_, const struct llama_model_params & params) {
        // Find the second occurrence of '/' after protocol string
        size_t pos = model_.find('/');
        pos        = model_.find('/', pos + 1);
        if (pos == std::string::npos) {
            return 1;
        }

        const std::string hfr = model_.substr(0, pos);
        const std::string hff = model_.substr(pos + 1);
        common_load_model_from_hf(hfr, hff, "", "", params);

        return 0;
    }

    int resolve_model(std::string & model_, const struct llama_model_params & params) {
        if (starts_with(model_, "hf://") || starts_with(model_, "huggingface://")) {
            remove_proto(model_);
            huggingface_dl(model_, params);
        } else if (starts_with(model_, "https://")) {
            common_load_model_from_url(model_, "", "", params);
        } else if (starts_with(model_, "file://")) {
            remove_proto(model_);
        }

        // Also implement ollama://, if file doesn't exist, assume ollama str

        return 0;
    }

so I left it out for now.

@ericcurtin ericcurtin force-pushed the new-style-run branch 2 times, most recently from 982cb52 to 2cb740a Compare December 10, 2024 12:42
@ericcurtin
Copy link
Contributor Author

ericcurtin commented Dec 10, 2024

@slaren @ggerganov On merge of this, one can start a chatbot via:

$ llama-run smollm
>

examples don't really get much simpler than this, from a user perspective at least...

@ericcurtin ericcurtin force-pushed the new-style-run branch 6 times, most recently from 8bdb9fd to 3e16ec1 Compare December 10, 2024 15:44
Copy link
Collaborator

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty cool. I would also appreciate some documentation about where the the model files are cached/stored, it's not very clear at the moment.

common/common.h Outdated Show resolved Hide resolved
examples/run/CMakeLists.txt Outdated Show resolved Hide resolved
examples/run/run.cpp Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
examples/run/run.cpp Show resolved Hide resolved
examples/run/run.cpp Outdated Show resolved Hide resolved
@ericcurtin
Copy link
Contributor Author

ericcurtin commented Dec 11, 2024

Pretty cool. I would also appreciate some documentation about where the the model files are cached/stored, it's not very clear at the moment.

It's basically like a "curl -O" or a wget, it just downloads it in the current directory as modelname.partial and then when the download is complete it's renamed to just modelname (that helps identify whether something is fully downloaded or not). It would be nice to have a full modelstore like RamaLama has, but maybe thats overkill for now.

I'll try and articulate that best I can in usage help, etc.

@ericcurtin
Copy link
Contributor Author

ericcurtin commented Dec 11, 2024

This is practically the same code with minor differences:

https://github.com/ericcurtin/lm-pull

But the one in this PR integrated with llama.cpp is much more useful, it actually runs the models 😄

@ericcurtin ericcurtin force-pushed the new-style-run branch 4 times, most recently from 642bd57 to 9d6debe Compare December 12, 2024 00:24
@ericcurtin
Copy link
Contributor Author

This is ready for re-review @slaren

@ericcurtin ericcurtin force-pushed the new-style-run branch 4 times, most recently from af36f34 to f05377a Compare December 12, 2024 11:31
@ericcurtin
Copy link
Contributor Author

Will probably show this tool at FOSDEM, I think the simplicity of it will appeal to people

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model fetching logic with libcurl is nice and should be promoted to libcommon and used everywhere we specify models filenames.

@ericcurtin ericcurtin force-pushed the new-style-run branch 4 times, most recently from d0eed57 to e5d949e Compare December 13, 2024 16:08
@ericcurtin
Copy link
Contributor Author

I made some further changes to the progress bar logic to eliminate flickering (just use one print call per progress bar update essentially) and I added some further progress bar info, so it now looks like this:

$ llama-run smollm:135m
13% |██                  | 12.12 MB/87.48 MB  3.07 MB/s  24s

@ericcurtin
Copy link
Contributor Author

I think this should be good for merge now

Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:

  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://granite-code:8b
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf

Signed-off-by: Eric Curtin <[email protected]>
@slaren
Copy link
Collaborator

slaren commented Dec 13, 2024

Some things to improve:

  • The command line parser probably shouldn't ignore parameters that start with -, for example:
$ build/bin/llama-run -ngl 100 llama3
curl_easy_perform() failed: HTTP response code said error
terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_3::detail::parse_error'
  what():  [json.exception.parse_error.101] parse error at line 1, column 1: attempting to parse an empty input; check that your input string or stream contains the expected JSON
fish: Job 1, 'build/bin/llama-run -ngl 100 ll…' terminated by signal SIGABRT (Abort)
  • The exceptions and error responses from the server could be handled more gracefully
  • The file storage will probably need some kind of managed cache rather than just storing a file without extension to the current directory
  • I couldn't build with MSVC, I think as it is curl is not directly available to llama-run, since it is private to common. It might be necessary to re-add the find_package(CURL) to the llama-run CMakeLists.txt.
llama.cpp\examples\run\run.cpp(8): fatal error C1083: Cannot open include file: 'curl/curl.h': No such file or directory

@slaren slaren merged commit c27ac67 into ggerganov:master Dec 13, 2024
47 checks passed
@ericcurtin ericcurtin deleted the new-style-run branch December 13, 2024 21:14
netrunnereve pushed a commit to netrunnereve/llama.cpp that referenced this pull request Dec 16, 2024
Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:

  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://granite-code:8b
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf

Signed-off-by: Eric Curtin <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:

  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://granite-code:8b
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf

Signed-off-by: Eric Curtin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants