-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add host traces to high-level profilings #577
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With regard to the cfg.json file - what do you think about putting it into vllm-hpu-extension, so that we have a default one and everyone does not have to create one for its own?
@@ -93,21 +97,82 @@ def __init__( | |||
torch_profiler_trace_dir = envs.VLLM_TORCH_PROFILER_DIR | |||
logger.info("Profiling enabled. Traces will be saved to: %s", | |||
torch_profiler_trace_dir) | |||
|
|||
if os.getenv('VLLM_PROFILER_ENABLED') == 'full': | |||
fn = self.full_trace_handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea with overriding the torch profiler handler to achieve this behavior!
else: | ||
self.profiler = None | ||
|
||
def full_trace_handler(self, dir_name, use_gzip=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this function does not use any HPUWorker class's arguments (self), so it can be extracted from HPUWorker logic and treated as a sperate. Since we already have quite a complex logic in HPUModelRunner and HPUWorker, could you move this code to vllm-hpu-extension/.../profiler.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line 133:
events = self.model_runner.profiler.profiling_trace_events
high_level_profiler = self.model_runner.profiler | ||
with high_level_profiler.record_event('internal', 'start_profiler'): | ||
# Clean up the queue | ||
while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a bit hacky. Can't we simple run
high_level_profiler.start('internal', 'start_profiler')
high_level_profiler.stop('internal', 'start_profiler')
run vllm server with
HABANA_PROF_CONFIG=$PWD/cfg.json VLLM_TORCH_PROFILER_DIR=$PWD HABANA_PROFILE=1 VLLM_PROFILER_ENABLED=full
cfg.json:
Start profiling with:
Stop profiling with: