Skip to content

Commit

Permalink
Merge pull request #478 from TylunasLi/doc
Browse files Browse the repository at this point in the history
完善“支持的模型”文档
  • Loading branch information
ztxz16 authored Jul 19, 2024
2 parents 2a0e9d0 + 0dc630b commit eca2c84
Show file tree
Hide file tree
Showing 7 changed files with 273 additions and 123 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,18 @@ fastllm是纯c++实现,无第三方依赖的多平台高性能大模型推理

部署交流QQ群: 831641348

| [快速开始](#快速开始) | [模型获取](#模型获取) |
| [快速开始](#快速开始) | [模型获取](docs/models.md) |

## 功能概述

- 🚀 纯c++实现,便于跨平台移植,可以在安卓上直接编译
- 🚀 无论ARM平台,X86平台,NVIDIA平台,速度都较快
- 🚀 支持读取Hugging face原始模型并直接量化
- 🚀 支持部署Openai api server
- 🚀 支持多卡部署,支持GPU + CPU混合部署
- 🚀 支持动态Batch,流式输出
- 🚀 前后端分离设计,便于支持新的计算设备
- 🚀 目前支持ChatGLM系列模型,Qwen2系列模型,各种LLAMA模型(ALPACA, VICUNA等),BAICHUAN模型,MOSS模型,MINICPM模型等
- 🚀 目前支持ChatGLM系列模型,Qwen系列模型,各种LLAMA模型(ALPACA, VICUNA等),BAICHUAN模型,MOSS模型,MINICPM模型等

## 快速开始

Expand Down Expand Up @@ -66,7 +67,7 @@ python3 -m ftllm.webui -t 16 -p ~/Qwen2-7B-Instruct/ --port 8080

目前模型的支持情况见: [模型列表](docs/models.md)

有一些架构暂时无法直接读取Hugging face模型,可以参考 [模型转换文档](docs/convert_model.md) 转换fastllm格式的模型
一些早期的HuggingFace模型无法直接读取,可以参考 [模型转换](docs/models.md#模型导出convert-offline) 转换fastllm格式的模型

### 运行demo程序 (c++)

Expand Down
96 changes: 0 additions & 96 deletions docs/convert_model.md

This file was deleted.

20 changes: 6 additions & 14 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,22 +27,14 @@ cmake .. -DUSE_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native
**解决办法:**

手动修改 CMakeLists.txt,根据GPU型号手动指定GPU的[Compute Capability](https://developer.nvidia.com/cuda-gpus)。如:

``` diff
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -52,7 +52,7 @@
#message(${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES})
set(FASTLLM_CUDA_SOURCES src/devices/cuda/cudadevice.cpp src/devices/cuda/cudadevicebatch.cpp src/devices/cuda/fastllm-cuda.cu)
set(FASTLLM_LINKED_LIBS ${FASTLLM_LINKED_LIBS} cublas)
- set(CMAKE_CUDA_ARCHITECTURES "native")
+ set(CMAKE_CUDA_ARCHITECTURES 61 75 86 89)
endif()

if (PY_API)
根据GPU型号手动指定GPU的[Compute Capability](https://developer.nvidia.com/cuda-gpus)。如:

```shell
cmake .. -DUSE_CUDA=ON -DCUDA_ARCH="61;75;86;89"
```

若需要支持多种GPU架构,请使用“;”分隔(如上面例子)。

### identifier "__hdiv" is undefined

**现象:**
Expand Down
Empty file removed docs/fastllm_pytools.md
Empty file.
4 changes: 3 additions & 1 deletion docs/llama_cookbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ XVERSE-13B-Chat V1 版本需要对输入做NFKC规范化,fastllm暂不支持
user_role="[|Human|]:", bot_role="\n[|AI|]:", history_sep="\n", dtype=dtype)
```

## Yi
### Yi

* 01-ai/[Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)

Expand All @@ -249,6 +249,8 @@ XVERSE-13B-Chat V1 版本需要对输入做NFKC规范化,fastllm暂不支持
user_role="<|im_start|>user\n", bot_role="<|im_end|><|im_start|>assistant\n", history_sep="<|im_end|>\n", dtype=dtype)
```

* [SUSTech/SUS-Chat-34B](https://huggingface.co/SUSTech/SUS-Chat-34B)

### WizardCoder

* [WizardCoder-Python-7B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-7B-V1.0)
Expand Down
Loading

0 comments on commit eca2c84

Please sign in to comment.