Skip to content

Commit

Permalink
Opt class for positional argument handling
Browse files Browse the repository at this point in the history
Added support for positional arguments `MODEL` and `PROMPT`. Added model
path resolution, just `file://` for now. Added functionality to download
via strings like:

llama-run llama3
llama-run ollama://granite-code
llama-run ollama://granite-code:8b
llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
llama-run https://example.com/some-file1.gguf
llama-run some-file2.gguf
llama-run file://some-file3.gguf

Signed-off-by: Eric Curtin <[email protected]>
  • Loading branch information
ericcurtin committed Dec 10, 2024
1 parent 26a8406 commit 3e16ec1
Show file tree
Hide file tree
Showing 5 changed files with 442 additions and 162 deletions.
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -433,6 +433,21 @@ To learn more about model quantization, [read this documentation](examples/quant

</details>

## [`llama-run`](examples/run)

#### A comprehensive example for running `llama.cpp` models. Useful for inferencing. Used with RamaLama [^1].

- <details>
<summary>Run a model with a specific prompt</summary>

```bash
llama-run granite-code
>
```

</details>

[^1]: [https://github.com/containers/ramalama](RamaLama)

## [`llama-simple`](examples/simple)

Expand Down
6 changes: 0 additions & 6 deletions common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1108,12 +1108,6 @@ struct ggml_threadpool_params ggml_threadpool_params_from_cpu_params(const cpu_p
#define CURL_MAX_RETRY 3
#define CURL_RETRY_DELAY_SECONDS 2


static bool starts_with(const std::string & str, const std::string & prefix) {
// While we wait for C++20's std::string::starts_with...
return str.rfind(prefix, 0) == 0;
}

static bool curl_perform_with_retry(const std::string& url, CURL* curl, int max_attempts, int retry_delay_seconds) {
int remaining_attempts = max_attempts;

Expand Down
11 changes: 8 additions & 3 deletions common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@ using llama_tokens = std::vector<llama_token>;

// build info
extern int LLAMA_BUILD_NUMBER;
extern char const * LLAMA_COMMIT;
extern char const * LLAMA_COMPILER;
extern char const * LLAMA_BUILD_TARGET;
extern const char * LLAMA_COMMIT;
extern const char * LLAMA_COMPILER;
extern const char * LLAMA_BUILD_TARGET;

struct common_control_vector_load_info;

Expand Down Expand Up @@ -437,6 +437,11 @@ std::vector<std::string> string_split<std::string>(const std::string & input, ch
return parts;
}

static bool starts_with(const std::string & str,
const std::string & prefix) { // While we wait for C++20's std::string::starts_with...
return str.rfind(prefix, 0) == 0;
}

bool string_parse_kv_override(const char * data, std::vector<llama_model_kv_override> & overrides);
void string_process_escapes(std::string & input);

Expand Down
13 changes: 12 additions & 1 deletion examples/run/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
set(TARGET llama-run)
add_executable(${TARGET} run.cpp)
install(TARGETS ${TARGET} RUNTIME)
target_link_libraries(${TARGET} PRIVATE llama ${CMAKE_THREAD_LIBS_INIT})

# Use curl to download model url
if (LLAMA_CURL)
find_package(CURL REQUIRED)
add_definitions(-DLLAMA_USE_CURL)
include_directories(${CURL_INCLUDE_DIRS})
find_library(CURL_LIBRARY curl REQUIRED)
set(LLAMA_COMMON_EXTRA_LIBS ${LLAMA_COMMON_EXTRA_LIBS} ${CURL_LIBRARY})
endif ()

target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
target_include_directories(${TARGET} PUBLIC ../../common)
target_compile_features(${TARGET} PRIVATE cxx_std_17)
Loading

0 comments on commit 3e16ec1

Please sign in to comment.