Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to set end_id and apply chat template for GAP Triton In-Process #183

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

tedzhouhk
Copy link
Collaborator

@tedzhouhk tedzhouhk commented Nov 15, 2024

Add two functions for GAP Triton In-Process:

  1. Pass EOS token as end_id so that output can stop at EOS.
  2. Apply chat template so that OSL is accurate.
  3. A small bug fix for empty first/last response in triton engine.

Comment on lines 86 to 87
if config.set_end_id:
payload["end_id"] = [config.tokenizer._tokenizer.eos_token_id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned about adding a new CLI option for this specific use cases. Since we try to avoid adding too much options to the tool, and this can be achieved through the --extra-inputs <name>:<value> as well.
cc @dyastremsky

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if this can be done via --extra-inputs, let's skip this change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is moved to --extra-inputs. however, code change is still necessary unless use directly provide eos token id (instead of fetching from tokenizer). Please let me know if current approach looks good. Thanks.

)

input_group.add_argument(
"--triton-converter-apply-chat-template",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_chat_template I think could be a useful cli option to add: https://huggingface.co/docs/transformers/main/en/chat_templating

But I would go for adding it in a more generic way to support chat_template for any use cases, not just for triton in-process.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I'm wrong. Is chat template only necessary to benchmark raw engines? Benchmarking through api endpoints does not need to add chat template in GAP, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I guess what I wanted to say was we want to make it more generic so that the option is not limited to only triton in-process although the current use case is just triton in-process. This way we have a stronger reason to add this to our cli options.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Does GAP have any other route to benchmark raw engines that might need chat templates? I am not familiar with the codebase and it would be great if someone from your team has some BW to help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link from Hyunjae shows how chat templates are used with some APIs. The only endpoint that I know supports templates right now is the chat endpoint.

The team is bandwidth-constrained at the moment, though that could be a good addition.

Comment on lines +285 to +289
elif isinstance(r["output_ids"], int):
token_ids.append(r["output_ids"])
else:
# for the empty first/last responses
token_ids.append(0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewkotila added a bug fix for empty first/last response (it will be and empty string and cause error).

Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start to this PR. I like the idea of making the chat template generic for use with any/all endpoints.

We should also add unit testing to support these changes.

@@ -571,6 +571,13 @@ def _add_image_input_args(parser):
"If format is not selected, format of generated image is selected at random",
)

input_group.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Hyunjae's point below. Also, the image input arg function does not seem like the right place for this.

)

input_group.add_argument(
"--triton-converter-apply-chat-template",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link from Hyunjae shows how chat templates are used with some APIs. The only endpoint that I know supports templates right now is the chat endpoint.

The team is bandwidth-constrained at the moment, though that could be a good addition.

set_end_id: bool = False

# whether to apply chat template in triton converter
apply_chat_template: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment to make this generic. You'd also want apply_chat_template to be a string if we're making it generic based on how other endpoints use chat templating.

Newline after this.

@@ -142,3 +142,9 @@ class InputsConfig:

# Seed used to generate random values
random_seed: int = DEFAULT_RANDOM_SEED

# whether to set end_id in triton converter
set_end_id: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still necessary? We don't want endpoint-specific fields in inputs_config.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, i'll remove this

@@ -68,6 +68,9 @@ def __call__(self, text, **kwargs) -> "BatchEncoding":
def encode(self, text, **kwargs) -> List[int]:
self._encode_args.update(kwargs)
return self._tokenizer.encode(text, **self._encode_args)

def apply_chat_template(self, text) -> List[int]:
return self._tokenizer.encode(self._tokenizer.apply_chat_template([{"role": "user", "content": text}], tokenize=False), add_special_tokens=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have a TRT-LLM-specific chat template in the tokenizer class. We should have the user provide the template, then the tokenizer applies it if has one.

If we're making this endpoint-specific, then it should exist solely in the converter as an extra arg. As long as it doesn't affect metrics, which I think it shouldn't for TRT-LLM in-process. (CC: @nv-hwoo)

@@ -82,4 +85,7 @@ def _add_request_params(self, payload: Dict, config: InputsConfig) -> None:
payload["min_length"] = [num_tokens]

for key, value in config.extra_inputs.items():
payload[key] = [value]
if key == "triton_converter_set_end_id" and value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably shorten this key, if you wanted. e.g. "set_end_id"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants