Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add host traces to high-level profilings #577

Merged
merged 5 commits into from
Dec 6, 2024

Conversation

szutenberg
Copy link

@szutenberg szutenberg commented Dec 3, 2024

run vllm server with
HABANA_PROF_CONFIG=$PWD/cfg.json VLLM_TORCH_PROFILER_DIR=$PWD HABANA_PROFILE=1 VLLM_PROFILER_ENABLED=full

cfg.json:

{
    "Plugins": [
        {
            "lib": "libhost_profiler.so",
            "name": "HostProfiler",
            "values": {
                "api_group": {
                    "HCCL": {
                        "value": false
                    },
                    "HLTHUNK": {
                        "value": false
                    },
                    "SYNAPSE": {
                        "value": false
                    }
                },
                "start_disabled": {
                    "value": true
                }
            },
            "enable": true
        },
        {
            "lib": "libhw_trace.so",
            "name": "HwTrace",
            "values": {
                "generalOptions": {
                    "traceBufferSize": {
                        "value": "0x8000000"
                    },
                    "profilePhase": {
                        "value": "profileApi"
                    }
                },
                "archProfileUnits": {
                    "gaudi3": {
                        "CS": {
                            "enable": {
                                "value": false
                            },
                            "Gaudi3CSAdvancedProfiling": {
                                "value": 0
                            }
                        }
                    }
                }
            },
            "enable": false
        }
    ]
}

Start profiling with:

curl -X POST http://localhost:8080/start_profile

Stop profiling with:

curl -X POST http://localhost:8080/stop_profile

@szutenberg szutenberg marked this pull request as ready for review December 4, 2024 13:41
@szutenberg szutenberg requested a review from mswiniarsk December 4, 2024 13:41
Copy link

@mswiniarsk mswiniarsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regard to the cfg.json file - what do you think about putting it into vllm-hpu-extension, so that we have a default one and everyone does not have to create one for its own?

@@ -93,21 +97,82 @@ def __init__(
torch_profiler_trace_dir = envs.VLLM_TORCH_PROFILER_DIR
logger.info("Profiling enabled. Traces will be saved to: %s",
torch_profiler_trace_dir)

if os.getenv('VLLM_PROFILER_ENABLED') == 'full':
fn = self.full_trace_handler

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea with overriding the torch profiler handler to achieve this behavior!

else:
self.profiler = None

def full_trace_handler(self, dir_name, use_gzip=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this function does not use any HPUWorker class's arguments (self), so it can be extracted from HPUWorker logic and treated as a sperate. Since we already have quite a complex logic in HPUModelRunner and HPUWorker, could you move this code to vllm-hpu-extension/.../profiler.py?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 133:

events = self.model_runner.profiler.profiling_trace_events

high_level_profiler = self.model_runner.profiler
with high_level_profiler.record_event('internal', 'start_profiler'):
# Clean up the queue
while True:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a bit hacky. Can't we simple run
high_level_profiler.start('internal', 'start_profiler')
high_level_profiler.stop('internal', 'start_profiler')

@szutenberg szutenberg requested a review from mswiniarsk December 5, 2024 09:54
@mswiniarsk mswiniarsk merged commit a805205 into habana_main Dec 6, 2024
9 checks passed
@mswiniarsk mswiniarsk changed the title [SW-206682] Add host traces to high-level profilings Add host traces to high-level profilings Dec 9, 2024
@szutenberg szutenberg deleted the dev/mszutenberg/improve_profiler branch January 3, 2025 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants