-
Notifications
You must be signed in to change notification settings - Fork 506
Issues: sgl-project/sglang
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Feature] Support Qwen2-VL based embedding model
good first issue
Good for newcomers
#2032
opened Nov 14, 2024 by
VoVAllen
2 tasks done
[BUG] Jump forward w/ outlines backend slightly changes the decoding results
grammar-backend
#2025
opened Nov 13, 2024 by
merrymercy
[BUG] xgrammar does not follow the constraint
grammar-backend
#2017
opened Nov 12, 2024 by
merrymercy
[Feature] Regex stop condition
good first issue
Good for newcomers
#2007
opened Nov 11, 2024 by
SinanAkkoyun
2 tasks done
[Feature] Are there plans to support AWQ and torch compile?
#1991
opened Nov 11, 2024 by
sitabulaixizawaluduo
2 tasks done
[Bug] amdgpu,tp-size=2,Detected errors during sampling! NaN in the logits.
amd
#1953
opened Nov 8, 2024 by
linqingxu
5 tasks done
[Feature] Add LoRA Support for Chat Completion in SGLang
good first issue
Good for newcomers
#1936
opened Nov 6, 2024 by
mssongit
2 tasks done
[Feature] Save cache from requests and load
good first issue
Good for newcomers
#1932
opened Nov 6, 2024 by
SinanAkkoyun
2 tasks done
[Bug] Torch 2.5 issue with Tensor Parallel Size > 1
#1925
opened Nov 5, 2024 by
CortexEdgeUser
5 tasks done
[Bug] Launching a server with
--enable-torch-compile
produce torch dynamo error
#1923
opened Nov 5, 2024 by
msublee
5 tasks done
[Bug] Make multi-lora serving compatible with cuda graph and radix cache
help wanted
Extra attention is needed
#1921
opened Nov 5, 2024 by
LIUKAI0815
4 of 5 tasks
[Feature]Support Qwen2_5...etc tools calling by OpenAI API
good first issue
Good for newcomers
#1912
opened Nov 4, 2024 by
CedricHwong
1 of 2 tasks
Expose max_total_num_tokens for Token Limit Calculation in Request Handling
good first issue
Good for newcomers
#1900
opened Nov 3, 2024 by
hahmad2008
1 of 5 tasks
[Bug] Offline engine performance is not better than local server when running batch
#1872
opened Nov 1, 2024 by
jischein
5 tasks done
TP8 scheduling overhead is very high for small model, Llama 3 8B on AMD
amd
#1857
opened Oct 31, 2024 by
hliuca
5 tasks done
Questions Regarding sglang vs vllm and Memory Management
#1828
opened Oct 28, 2024 by
hahmad2008
1 of 5 tasks
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-10-16.