[HPU] Add mark_step configurable for the decoder layer. #525

jiminha · 2024-11-20T00:55:20Z

We are seeing 10% performance regression in the llama-based model due to vllm-project#10239. The mark_step() function needs to be configured differently for each model to achieve the best performance. For some models, mark_step() for every decoder step would be optimal, but for other models, it's better to run it every n-th step. We are adding a counter to only register the hook for every n-th step, which can be configured with VLLM_CONFIG_HIDDEN_LAYERS

zhouyuan · 2024-11-20T02:35:48Z

CC @jikunshang

jikunshang · 2024-11-20T03:01:05Z

vllm/model_executor/models/llama.py

        for i in range(self.start_layer, self.end_layer):
            layer = self.layers[i]
            hidden_states, residual = layer(positions, hidden_states,
                                            kv_caches[i - self.start_layer],
                                            attn_metadata, residual)
-            if is_hpu and i % self.config_hidden_layers == 0:


I noticed qwen.py(and some other model files) also add mark_step previously, we can remove it.

ok, I will take a look. It seems bigcode also use this configuration parameter, but it's not DecodeLayer they need a markstep, it's something else(88 x GPTBigCodeBlock), so we will need different suffix configuration as well for different model.

Will check out further and update.

@jikunshang Please check the new batch. For the bigcode, I need to run it with VLLM_CONFIG_HIDDEN_LAYERS_SUFFIX="BigCodeBlock". By the way, which model was the original code changes for?

vllm/worker/hpu_model_runner.py

michalkuligowski · 2024-11-20T13:12:12Z

vllm/worker/hpu_model_runner.py

@@ -756,7 +764,13 @@ def load_model(self) -> None:
            elif not is_fake_hpu():
                self.model = self.model.to("hpu")
                htcore.mark_step()
-            modify_decoder_layer(self.model)
+
+            hidden_layer_markstep = int(


Maybe "hidden_layer_markstep_interval" would be a better name?

michalkuligowski · 2024-11-21T11:00:56Z

vllm/worker/hpu_model_runner.py

+
+            hidden_layer_markstep_interval = int(
+                os.getenv('VLLM_CONFIG_HIDDEN_LAYERS', '1'))
+            hideen_layer_suffix = os.getenv('VLLM_CONFIG_HIDDEN_LAYERS_SUFFIX',


This is a new env variable please add description in README_GAUDI.md and gaudi-installation.rst

[HPU]add mark_step configurable for decoder layer

aeac04a

jiminha requested a review from libinta November 20, 2024 00:55

Ruff style fix

2acfdaf

jiminha requested a review from madamczykhabana November 20, 2024 02:08

jikunshang reviewed Nov 20, 2024

View reviewed changes

Updated qwen and GPT-bigcode mark_step with forward-hook

9252e60

michalkuligowski requested changes Nov 20, 2024

View reviewed changes

Typo error fix and Update based on the comment

60e7f49

michalkuligowski requested changes Nov 21, 2024

View reviewed changes

jiminha added 2 commits November 21, 2024 16:25

Add table mapping for hidden layer suffix

0569543

Fix ruff error

4f56b20

michalkuligowski mentioned this pull request Nov 22, 2024

Add mark_step for baichuan #515

Closed

michalkuligowski approved these changes Nov 25, 2024

View reviewed changes

jiminha mentioned this pull request Nov 26, 2024

[HPU] Add mark_step configurable for the decoder layer #548

Closed

michalkuligowski merged commit b62f1b2 into habana_main Nov 26, 2024
9 checks passed

michalkuligowski deleted the jha/markstep_config branch November 26, 2024 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HPU] Add mark_step configurable for the decoder layer. #525

[HPU] Add mark_step configurable for the decoder layer. #525

jiminha commented Nov 20, 2024 •

edited by github-actions bot

Loading

zhouyuan commented Nov 20, 2024

jikunshang Nov 20, 2024

jiminha Nov 20, 2024

jiminha Nov 20, 2024

michalkuligowski Nov 20, 2024

jiminha Nov 20, 2024

michalkuligowski Nov 21, 2024

[HPU] Add mark_step configurable for the decoder layer. #525

[HPU] Add mark_step configurable for the decoder layer. #525

Conversation

jiminha commented Nov 20, 2024 • edited by github-actions bot Loading

zhouyuan commented Nov 20, 2024

jikunshang Nov 20, 2024

Choose a reason for hiding this comment

jiminha Nov 20, 2024

Choose a reason for hiding this comment

jiminha Nov 20, 2024

Choose a reason for hiding this comment

michalkuligowski Nov 20, 2024

Choose a reason for hiding this comment

jiminha Nov 20, 2024

Choose a reason for hiding this comment

michalkuligowski Nov 21, 2024

Choose a reason for hiding this comment

jiminha commented Nov 20, 2024 •

edited by github-actions bot

Loading