-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] : Ray Thread Actor may cause cuda memory leakage. #49360
Comments
import torch
import ray
def log_gpu_memory_usage(head: str):
memory_allocated = torch.cuda.memory_allocated() / 1024 ** 3
memory_reserved = torch.cuda.memory_reserved() / 1024 ** 3
message = f'{head}, memory allocated (GB): {memory_allocated}, memory reserved (GB): {memory_reserved}'
return memory_allocated, memory_reserved
MAX_NUM_OF_MEM_EVENTS_PER_SNAPSHOT: int = 100000
def start_record_memory_history() -> None:
print("Starting snapshot record_memory_history")
torch.cuda.memory._record_memory_history(
max_entries=MAX_NUM_OF_MEM_EVENTS_PER_SNAPSHOT
)
def stop_record_memory_history() -> None:
print("Stopping snapshot record_memory_history")
torch.cuda.memory._record_memory_history(enabled=None)
def export_memory_snapshot(file_prefix: str) -> None:
# Prefix for file names.
try:
print(f"Saving snapshot to local file: {file_prefix}.pickle")
torch.cuda.memory._dump_snapshot(f"{file_prefix}.pickle")
except Exception as e:
print(f"Failed to capture memory snapshot {e}")
return
@ray.remote(num_gpus=1)
class Actor:
def __init__(self, name: str):
self.name = name
def compute(self):
start_record_memory_history()
tensor_size = (1024, 1024)
mat_a = torch.rand(tensor_size, device='cuda')
mat_b = torch.rand(tensor_size, device='cuda')
mat_c = torch.mm(mat_a, mat_b) #torch.mm will use cublas to do the matrix multiplication, which will allocate memory(workspace) on the GPU
metrics = {}
memory_allocated, memory_reserved = log_gpu_memory_usage(head=f"{self.name} before empty cache")
metrics["onload/memory_allocated"] = memory_allocated
metrics["onload/memory_reserved"] = memory_reserved
del mat_a, mat_b, mat_c
torch.cuda.empty_cache()
# While tensors are deleted, the workspace allocated by cublas is not released.
memory_allocated, memory_reserved = log_gpu_memory_usage(head=f"{self.name} after empty cache")
metrics["offload/memory_allocated"] = memory_allocated
metrics["offload/memory_reserved"] = memory_reserved
export_memory_snapshot(self.name)
stop_record_memory_history()
return metrics
def test(num_threads: int):
ray.init()
actor_handler = Actor.options(max_concurrency=num_threads).remote(f"num_threads_{num_threads}") #the size of the thread pool is num_threads
futures = [actor_handler.compute.remote() for i in range(num_threads)] #fill the thread pool with num_threads tasks
metrics = ray.get(futures)
print(f"num_thread:{num_threads} metrics: {metrics[-1]}")
ray.shutdown()
if __name__ == '__main__':
num_threads = [2**i for i in range(10)]
for x in num_threads:
test(x)
I believe the leaked memory comes from the cuBLAS workspace. When calling torch.mm, PyTorch uses cuBLAS to perform the matrix multiplication, and cuBLAS allocates a workspace that occupies GPU memory, as described in torch getWorkSpace. Additionally, PyTorch allocates a cuBLAS handler for each thread (source code), which causes the size of the workspace to be proportional to the number of threads, specifically the max_concurrency in Ray. To validate this hypothesis, you can run the code above and paste the generated files into https://pytorch.org/memory_viz to check the final memory usage. You can refer to this link for more information. |
Okay, I understand. Thank you for your response. |
Thanks for the great answer @kf-zhang . |
What happened + What you expected to happen
When we were using ray.Actor to perform calculations related to PyTorch tensors, we noticed that the CUDA memory usage continuously increased with each computation step.
After investigation, we found that this was caused by setting max_concurrency > 1.
In the test provided below, the memory leak reached 3 GB after executing 100 times.
Versions / Dependencies
Version: 2.40.0
python
Reproduction script
last result:
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: