hamelsmu / llama-inference Public

Notifications You must be signed in to change notification settings
Fork 16
Star 104

Code
Issues 2
Pull requests 13
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: hamelsmu/llama-inference

Labels 9 Milestones 0

New pull request New

13 Open 1 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

CTranslate2: support multiple GPUs (if run with mpirun) and Flash Attention 2.

#17 opened Apr 11, 2024 by ivanbaldo

Loading…

OpenAI compatible servers benchmark based on the anyscale and exllama benchmarks.

#16 opened Apr 8, 2024 by ivanbaldo

Loading…

Update exllama/bench.py from OpenAI 0.27.8 to 1.16.2.

#15 opened Apr 4, 2024 by ivanbaldo

Loading…

Upstream renamed from mlc_chat to mlc_llm.

#14 opened Apr 2, 2024 by ivanbaldo

Loading…

Add new candle-vllm Dockerfile with instructions to benchmark it.

#13 opened Feb 8, 2024 by ivanbaldo

Loading…

Add PowerInfer benchmark.

#12 opened Jan 31, 2024 by ivanbaldo

Loading…

Add mlc/Dockerfile with instructions inside it.

#11 opened Jan 23, 2024 by ivanbaldo

Loading…

Add ctranslate/Dockerfile with instructions to use it.

#10 opened Jan 19, 2024 by ivanbaldo

Loading…

hf/bench.py: need to specify bfloat16 otherwise it consumes twice as much memory in A10.

#9 opened Jan 17, 2024 by ivanbaldo

Loading…

Add a Dockerfile for the /hf benchmarks with instructions to build and run them.

#7 opened Dec 11, 2023 by ivanbaldo

Loading…

Fix hf/bench-gptq.py.

#6 opened Dec 11, 2023 by ivanbaldo

Loading…

Add plot in summary notebook

#5 opened Nov 28, 2023 by emattia

Loading…

Add support for all Hugging Face Chat, Text models + OpenAI, Claude2, Cohere, Palm, Replicate models

#3 opened Aug 14, 2023 by ishaan-jaff

Loading…

ProTip! What’s not been updated in a month: updated:<2024-12-11.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly