Add host traces to high-level profilings #577

szutenberg · 2024-12-03T13:06:58Z

run vllm server with
HABANA_PROF_CONFIG=$PWD/cfg.json VLLM_TORCH_PROFILER_DIR=$PWD HABANA_PROFILE=1 VLLM_PROFILER_ENABLED=full

cfg.json:

{
    "Plugins": [
        {
            "lib": "libhost_profiler.so",
            "name": "HostProfiler",
            "values": {
                "api_group": {
                    "HCCL": {
                        "value": false
                    },
                    "HLTHUNK": {
                        "value": false
                    },
                    "SYNAPSE": {
                        "value": false
                    }
                },
                "start_disabled": {
                    "value": true
                }
            },
            "enable": true
        },
        {
            "lib": "libhw_trace.so",
            "name": "HwTrace",
            "values": {
                "generalOptions": {
                    "traceBufferSize": {
                        "value": "0x8000000"
                    },
                    "profilePhase": {
                        "value": "profileApi"
                    }
                },
                "archProfileUnits": {
                    "gaudi3": {
                        "CS": {
                            "enable": {
                                "value": false
                            },
                            "Gaudi3CSAdvancedProfiling": {
                                "value": 0
                            }
                        }
                    }
                }
            },
            "enable": false
        }
    ]
}

Start profiling with:

curl -X POST http://localhost:8080/start_profile

Stop profiling with:

curl -X POST http://localhost:8080/stop_profile

mswiniarsk

With regard to the cfg.json file - what do you think about putting it into vllm-hpu-extension, so that we have a default one and everyone does not have to create one for its own?

mswiniarsk · 2024-12-05T08:19:05Z

vllm/worker/hpu_worker.py

@@ -93,21 +97,82 @@ def __init__(
            torch_profiler_trace_dir = envs.VLLM_TORCH_PROFILER_DIR
            logger.info("Profiling enabled. Traces will be saved to: %s",
                        torch_profiler_trace_dir)
+
+            if os.getenv('VLLM_PROFILER_ENABLED') == 'full':
+                fn = self.full_trace_handler


Great idea with overriding the torch profiler handler to achieve this behavior!

mswiniarsk · 2024-12-05T08:24:16Z

vllm/worker/hpu_worker.py

        else:
            self.profiler = None

+    def full_trace_handler(self, dir_name, use_gzip=False):


I see that this function does not use any HPUWorker class's arguments (self), so it can be extracted from HPUWorker logic and treated as a sperate. Since we already have quite a complex logic in HPUModelRunner and HPUWorker, could you move this code to vllm-hpu-extension/.../profiler.py?

See line 133:

events = self.model_runner.profiler.profiling_trace_events

mswiniarsk · 2024-12-05T08:25:06Z

vllm/worker/hpu_worker.py

+        high_level_profiler = self.model_runner.profiler
+        with high_level_profiler.record_event('internal', 'start_profiler'):
+            # Clean up the queue
+            while True:


Looks a bit hacky. Can't we simple run
high_level_profiler.start('internal', 'start_profiler')
high_level_profiler.stop('internal', 'start_profiler')

szutenberg added 5 commits December 3, 2024 15:03

[SW-206682] Add host traces to high-level profilings

02524fe

[SW-206682] Add host traces to high-level profilings

e7e1b98

Fix formatting

9cf3b34

Refactor code + handle gzip

6eb826e

Fix yapf

23c485b

szutenberg marked this pull request as ready for review December 4, 2024 13:41

szutenberg requested a review from mswiniarsk December 4, 2024 13:41

This was referenced Dec 4, 2024

Set vllm-hpu-extension to c2cd742 #588

Closed

Set vllm-hpu-extension to c2cd742 #589

Closed

mswiniarsk requested changes Dec 5, 2024

View reviewed changes

szutenberg requested a review from mswiniarsk December 5, 2024 09:54

mswiniarsk approved these changes Dec 5, 2024

View reviewed changes

mswiniarsk merged commit a805205 into habana_main Dec 6, 2024
9 checks passed

mswiniarsk changed the title ~~[SW-206682] Add host traces to high-level profilings~~ Add host traces to high-level profilings Dec 9, 2024

szutenberg deleted the dev/mszutenberg/improve_profiler branch January 3, 2025 07:34

szutenberg added a commit that referenced this pull request Jan 8, 2025

Add host traces to high-level profilings (#577)

9034631

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add host traces to high-level profilings #577

Add host traces to high-level profilings #577

szutenberg commented Dec 3, 2024 •

edited by github-actions bot

Loading

mswiniarsk left a comment

mswiniarsk Dec 5, 2024

mswiniarsk Dec 5, 2024

szutenberg Dec 5, 2024

mswiniarsk Dec 5, 2024

Add host traces to high-level profilings #577

Add host traces to high-level profilings #577

Conversation

szutenberg commented Dec 3, 2024 • edited by github-actions bot Loading

mswiniarsk left a comment

Choose a reason for hiding this comment

mswiniarsk Dec 5, 2024

Choose a reason for hiding this comment

mswiniarsk Dec 5, 2024

Choose a reason for hiding this comment

szutenberg Dec 5, 2024

Choose a reason for hiding this comment

mswiniarsk Dec 5, 2024

Choose a reason for hiding this comment

szutenberg commented Dec 3, 2024 •

edited by github-actions bot

Loading