Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snow 1455266 - Upgrade Triton to Resolve CVEs #175

Open
wants to merge 86 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
2201642
abc
sfc-gh-zhwang Jul 19, 2023
5b7db76
Add files via upload
fjkattan Jul 22, 2023
fd34ba2
Merge pull request #3 from Snowflake-Labs/fjkattan-patch-1
fjkattan Jul 22, 2023
968920a
commit
sfc-gh-zhwang Aug 19, 2023
1d8b7fd
commit
sfc-gh-zhwang Aug 19, 2023
123933b
commit
sfc-gh-zhwang Aug 20, 2023
ab152e2
commit
sfc-gh-zhwang Aug 21, 2023
5db164b
commit
sfc-gh-zhwang Aug 21, 2023
27b9a36
commit
sfc-gh-zhwang Aug 24, 2023
314cc9d
commit
sfc-gh-zhwang Aug 24, 2023
ec8ba94
commit
sfc-gh-zhwang Aug 24, 2023
63cb0b6
commit
sfc-gh-zhwang Aug 24, 2023
25fbeb9
commit
sfc-gh-zhwang Aug 24, 2023
35b9388
commit
sfc-gh-zhwang Aug 24, 2023
d259efc
commit
sfc-gh-zhwang Aug 24, 2023
e0562b1
commit
sfc-gh-zhwang Aug 25, 2023
2d60bbe
commit
sfc-gh-zhwang Aug 28, 2023
d9125f2
Merge branch 'corvo' into zhwang/llama2
sfc-gh-zhwang Aug 28, 2023
31babb0
commit
sfc-gh-zhwang Aug 28, 2023
9c8fe16
Merge pull request #6 from Snowflake-Labs/zhwang/llama2
sfc-gh-zhwang Aug 28, 2023
08625c3
commit
sfc-gh-zhwang Aug 30, 2023
ffd06a7
commit
sfc-gh-zhwang Aug 30, 2023
58b54ec
commit
sfc-gh-zhwang Aug 30, 2023
a689bf2
Merge pull request #7 from Snowflake-Labs/zhwang/llama-input-length
sfc-gh-zhwang Aug 30, 2023
a5ca901
Merge pull request #9 from triton-inference-server/main
sfc-gh-zhwang Sep 6, 2023
d717478
Merge pull request #10 from Snowflake-Labs/main
sfc-gh-zhwang Sep 6, 2023
f0da91f
update FasterTransformer commit
sfc-gh-zhwang Sep 6, 2023
c79ff8c
Merge pull request #11 from Snowflake-Labs/zhwang/nit
sfc-gh-zhwang Sep 6, 2023
c0d26f6
commit
sfc-gh-zhwang Sep 11, 2023
52ec312
commit
sfc-gh-zhwang Sep 12, 2023
a19846a
Merge pull request #12 from Snowflake-Labs/zhwang/llama-gqa
sfc-gh-zhwang Sep 12, 2023
3147357
commit
sfc-gh-zhwang Sep 24, 2023
8de4960
commit
sfc-gh-zhwang Sep 24, 2023
bde5c23
commit
sfc-gh-zhwang Sep 24, 2023
7768d46
commit
sfc-gh-zhwang Sep 24, 2023
065a083
commit
sfc-gh-zhwang Sep 24, 2023
4977234
commit
sfc-gh-zhwang Sep 24, 2023
1931ae8
commit
sfc-gh-zhwang Sep 24, 2023
7473999
commit
sfc-gh-zhwang Sep 25, 2023
636c79f
commit
sfc-gh-zhwang Sep 26, 2023
d6c01f6
commit
sfc-gh-zhwang Sep 26, 2023
3b2b9bd
Merge pull request #15 from Snowflake-Labs/zhwang/bart-pr
sfc-gh-zhwang Sep 26, 2023
ed99f64
commit
sfc-gh-zhwang Sep 26, 2023
4d0a407
commit
sfc-gh-zhwang Sep 26, 2023
dbddbf4
Merge pull request #16 from Snowflake-Labs/zhwang/a776ecacfb8ef054366…
sfc-gh-zhwang Sep 26, 2023
55afffb
commit
sfc-gh-zhwang Sep 28, 2023
f72a5de
Merge pull request #17 from Snowflake-Labs/zhwang/GIT_SHALLOW
sfc-gh-zhwang Sep 28, 2023
9e66f9a
commit
sfc-gh-zhwang Sep 29, 2023
a190564
commit
sfc-gh-zhwang Sep 29, 2023
a225104
Merge pull request #18 from Snowflake-Labs/zhwang/code-llama2
sfc-gh-zhwang Sep 29, 2023
d670806
commit
sfc-gh-zhwang Sep 29, 2023
59c0e08
Merge pull request #19 from Snowflake-Labs/zhwang/nit2
sfc-gh-zhwang Sep 29, 2023
807c774
Merge branch 'triton-inference-server:main' into corvo
sfc-gh-hykim Oct 2, 2023
33729f0
commit
sfc-gh-zhwang Oct 5, 2023
94416df
Merge pull request #25 from Snowflake-Labs/zhwang/mbart-multilanguage
sfc-gh-zhwang Oct 5, 2023
a33ad4e
commit
sfc-gh-zhwang Oct 8, 2023
03bebc8
commit
sfc-gh-zhwang Oct 8, 2023
70dd1f8
Merge pull request #26 from Snowflake-Labs/zhwang/fix-code-llama-long…
sfc-gh-zhwang Oct 8, 2023
9a78831
commit
sfc-gh-zhwang Oct 12, 2023
952745a
Merge pull request #27 from Snowflake-Labs/zhwang/d1f088243f98ea967b4…
sfc-gh-zhwang Oct 12, 2023
48051df
commit
sfc-gh-zhwang Oct 13, 2023
5385caf
Merge pull request #28 from Snowflake-Labs/zhwang/5ed1f245e7f06d9cb72…
sfc-gh-zhwang Oct 13, 2023
75508a7
commit
sfc-gh-zhwang Oct 16, 2023
5a48748
Merge pull request #29 from Snowflake-Labs/zhwang/336e487a7d42932e819…
sfc-gh-zhwang Oct 16, 2023
fb2e9d9
add langid
sfc-gh-ybsat Oct 20, 2023
0bf5e78
Merge pull request #30 from Snowflake-Labs/yahia/add-langid-2
sfc-gh-ybsat Oct 20, 2023
dcfde00
commit (#31)
sfc-gh-zhwang Oct 24, 2023
d2e15ad
Zhwang/codeowner (#32)
sfc-gh-zhwang Oct 26, 2023
801c149
commit (#33)
sfc-gh-zhwang Oct 30, 2023
d9ab913
commit (#34)
sfc-gh-zhwang Oct 31, 2023
df3d9f5
update
sfc-gh-ybsat Nov 11, 2023
883f694
Merge pull request #37 from Snowflake-Labs/yahia/buii;d
sfc-gh-ybsat Nov 11, 2023
85139a9
update
sfc-gh-ybsat Nov 14, 2023
b9ebd41
Merge pull request #38 from Snowflake-Labs/yahia/update-data
sfc-gh-ybsat Nov 14, 2023
106e658
update tag
sfc-gh-ybsat Nov 18, 2023
4b5880f
Merge pull request #39 from Snowflake-Labs/yahia/llama2-32k
sfc-gh-ybsat Nov 18, 2023
ede71f7
m2m
sfc-gh-ybsat Jan 17, 2024
341ad6c
tag
sfc-gh-ybsat Jan 17, 2024
4b78f25
Merge pull request #40 from Snowflake-Labs/yahia/m2m-support
sfc-gh-ybsat Jan 17, 2024
eb0f589
update
sfc-gh-ybsat Jan 17, 2024
ecf685e
Merge pull request #41 from Snowflake-Labs/yahia/fix-m2m-build
sfc-gh-ybsat Jan 17, 2024
d6baf7c
d
sfc-gh-ybsat Feb 8, 2024
8456494
Merge pull request #42 from Snowflake-Labs/yahia/add-lingua
sfc-gh-ybsat Feb 8, 2024
312992c
going to latest triton version to resolve cves
sfc-gh-dbove Oct 2, 2024
84642ae
build failing w latest triton image... lets try just removing the cve…
sfc-gh-dbove Oct 2, 2024
ff4dc04
had to change to purge for grype scan to not list the vulns
sfc-gh-dbove Oct 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @sfc-gh-zhwang @sfc-gh-hykim
5 changes: 2 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -113,9 +113,8 @@ if (EXISTS ${FT_DIR})
else()
FetchContent_Declare(
repo-ft
GIT_REPOSITORY https://github.com/NVIDIA/FasterTransformer.git
GIT_TAG main
GIT_SHALLOW ON
GIT_REPOSITORY https://github.com/neevaco/FasterTransformer.git
GIT_TAG b6b21406449ab19f00d1d5f97338065037b5f8e3
)
endif()
FetchContent_MakeAvailable(repo-common repo-core repo-backend repo-ft)
Expand Down
1 change: 1 addition & 0 deletions LEGAL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This application is not part of the Snowflake Service and is governed by the terms in LICENSE, unless expressly agreed to in writing. You use this application at your own risk, and Snowflake has no obligation to support your use of this application.
9 changes: 8 additions & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@ RUN apt-get update && \
RUN pip3 install --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cu118 torch==2.0.1+cu118 && \
pip3 install --no-cache-dir --extra-index-url https://pypi.ngc.nvidia.com regex fire tritonclient[all] && \
pip3 install --no-cache-dir accelerate transformers huggingface_hub tokenizers SentencePiece sacrebleu datasets tqdm omegaconf rouge_score && \
pip3 install --no-cache-dir cmake==3.24.3
pip3 install --no-cache-dir cmake==3.24.3 && \
pip3 install --no-cache-dir langid==1.1.6 && \
pip3 install --no-cache-dir lingua-language-detector==2.0.2

# backend build
ADD . /workspace/build/fastertransformer_backend
Expand All @@ -66,6 +68,11 @@ RUN CUDAFLAGS="-include stdio.h" cmake \
rm /workspace/build/fastertransformer_backend/build/bin/*_example -rf && \
rm /workspace/build/fastertransformer_backend/build/lib/lib*Backend.so -rf

# Removing git because of CVEs, no longer needed after build
RUN apt-get purge git git-man -y && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

ENV NCCL_LAUNCH_MODE=GROUP
ENV WORKSPACE /workspace
WORKDIR /workspace
Expand Down
61 changes: 61 additions & 0 deletions src/libfastertransformer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,14 @@

// FT's libraries have dependency with triton's lib
#include "src/fastertransformer/triton_backend/bert/BertTritonModel.h"
#include "src/fastertransformer/triton_backend/bart/BartTritonModel.h"
#include "src/fastertransformer/triton_backend/m2m/M2MTritonModel.h"
#include "src/fastertransformer/triton_backend/deberta/DebertaTritonModel.h"
#include "src/fastertransformer/triton_backend/gptj/GptJTritonModel.h"
#include "src/fastertransformer/triton_backend/gptj/GptJTritonModelInstance.h"
#include "src/fastertransformer/triton_backend/gptneox/GptNeoXTritonModel.h"
#include "src/fastertransformer/triton_backend/gptneox/GptNeoXTritonModelInstance.h"
#include "src/fastertransformer/triton_backend/llama/LlamaTritonModel.h"
#include "src/fastertransformer/triton_backend/multi_gpu_gpt/ParallelGptTritonModel.h"
#include "src/fastertransformer/triton_backend/multi_gpu_gpt/ParallelGptTritonModelInstance.h"
#include "src/fastertransformer/triton_backend/t5/T5TritonModel.h"
Expand Down Expand Up @@ -327,6 +331,63 @@ std::shared_ptr<AbstractTransformerModel> ModelState::ModelFactory(
} else if (data_type == "bf16") {
ft_model = std::make_shared<BertTritonModel<__nv_bfloat16>>(
tp, pp, custom_ar, model_dir, int8_mode, is_sparse, remove_padding);
#endif
}
} else if (model_type == "llama") {
const int int8_mode = param_get_int(param, "int8_mode");

if (data_type == "fp16") {
ft_model = std::make_shared<LlamaTritonModel<half>>(
tp, pp, custom_ar, model_dir, int8_mode);
} else if (data_type == "fp32") {
ft_model = std::make_shared<LlamaTritonModel<float>>(
tp, pp, custom_ar, model_dir, int8_mode);
#ifdef ENABLE_BF16
} else if (data_type == "bf16") {
ft_model = std::make_shared<LlamaTritonModel<__nv_bfloat16>>(
tp, pp, custom_ar, model_dir, int8_mode);
#endif
}
} else if (model_type == "bart") {
if (data_type == "fp16") {
ft_model = std::make_shared<BartTritonModel<half>>(
tp, pp, custom_ar, model_dir, 0);
} else if (data_type == "fp32") {
ft_model = std::make_shared<BartTritonModel<float>>(
tp, pp, custom_ar, model_dir, 0);
#ifdef ENABLE_BF16
} else if (data_type == "bf16") {
ft_model = std::make_shared<BartTritonModel<__nv_bfloat16>>(
tp, pp, custom_ar, model_dir, 0);
#endif
}
} else if (model_type == "m2m") {
if (data_type == "fp16") {
ft_model = std::make_shared<M2MTritonModel<half>>(
tp, pp, custom_ar, model_dir, 0);
} else if (data_type == "fp32") {
ft_model = std::make_shared<M2MTritonModel<float>>(
tp, pp, custom_ar, model_dir, 0);
#ifdef ENABLE_BF16
} else if (data_type == "bf16") {
ft_model = std::make_shared<M2MTritonModel<__nv_bfloat16>>(
tp, pp, custom_ar, model_dir, 0);
#endif
}
} else if (model_type == "deberta") {
const int is_sparse = param_get_bool(param,"is_sparse", false);
const int remove_padding = param_get_bool(param,"is_remove_padding", false);

if (data_type == "fp16") {
ft_model = std::make_shared<DebertaTritonModel<half>>(
tp, pp, custom_ar, model_dir, is_sparse, remove_padding);
} else if (data_type == "fp32") {
ft_model = std::make_shared<DebertaTritonModel<float>>(
tp, pp, custom_ar, model_dir, is_sparse, remove_padding);
#ifdef ENABLE_BF16
} else if (data_type == "bf16") {
ft_model = std::make_shared<DebertaTritonModel<__nv_bfloat16>>(
tp, pp, custom_ar, model_dir, is_sparse, remove_padding);
#endif
}
} else {
Expand Down