PicklingError #740

ghisloine · 2023-08-09T14:23:43Z

Thank you for the question @peterkim95 !

To load a general checkpoint, PyTorch provides a built-in function load_state_dict that you can call as follow:

checkpoint = torch.load('tmp/checkpoint-1000')
model.load_state_dict(checkpoint['model_state_dict'])

In Transformers4rec, we additionally simplified the model saving in the transformers4rec Trainer class (here) with a builtin method _save_model_and_checkpoint where you can save the checkpoints but the model class as well. By doing so, you don't have to re-define the model class.

An example of usage would be:

import cloudpickle
#Train and save the model 
recys_trainer = tr.Trainer(model, ...)
...
recsys_trainer._save_model_and_checkpoint(save_model_class=True)

# Load the model class and its checkpoint
checkpoint_path = 'tmp/checkpoint-1000'
model = cloudpickle.load(open(os.path.join(checkpoint_path, "model_class.pkl"), "rb"))

# Restoring model weights
model.load_state_dict(torch.load(os.path.join(checkpoint_path, "pytorch_model.bin")))

Let us know if those examples help you with your use-case :)

Originally posted by @sararb in #348 (comment)

When i am try this code, getting this error: PicklingError: Cannot pickle a prepared model with automatic mixed precision, please unwrap the model with Accelerator.unwrap_model(model) before pickling it.

If i try recsys_trainer.accelerator.unwrap_model and save again it is saving but at this time i am getting model.forward() missing 1 required positional argument: 'inputs'

My main aim is saving model and using like recsys_trainer.predict() it in another platform without triton server and GPU.

The text was updated successfully, but these errors were encountered:

suyee97 · 2023-09-13T05:57:54Z

same issue here, got the error
_pickle.PicklingError: Cannot pickle a prepared model with automatic mixed precision, please unwrap the model with Accelerator.unwrap_model(model) before pickling it.

rnyak · 2023-09-13T14:37:43Z

@ghisloine

how did you install transformers4rec? we recommend to use merlin-pytorch:23.06 docker image.
are you able to run these notebooks 1, 2, 3? Please run these examples at your end first. As you can see in the 3rd nb we are loding back the model via cloudpickle.load(). does it work for you?
are you training a model with fp16=True ? if automatic mixed precision does not work, you can disable it and test again pls?

ghisloine · 2023-09-13T18:38:01Z

Actually i never used docker image. I am directly copy blocks of code from notebooks directly into Colab. I have also a proble m about loading models like training in CPU loading in GPU or vice versa using cloudpickle. It says something about no cudf found for example in CPU machine.

suyee97 · 2023-09-14T03:40:04Z

setting save_model_class=False worked for me. Just need to load the model class manually.

dcy0577 · 2023-09-15T21:04:46Z

I got the same error as well. @suyee97 Could you please share where to load model class and how?

And if I use the code from notebook 02 to save model:

model_path= os.environ.get("OUTPUT_DIR", f"{INPUT_DATA_DIR}/saved_model")
model.save(model_path)

Same error observed.

rnyak · 2023-09-17T17:16:05Z

Actually i never used docker image. I am directly copy blocks of code from notebooks directly into Colab. I have also a proble m about loading models like training in CPU loading in GPU or vice versa using cloudpickle. It says something about no cudf found for example in CPU machine.

@ghisloine we designed the examples to run on GPU. so it is normal you are getting no cudf found warning/error in CPU machine. if you use colab use GPU there. Follow the instructions in this blog post to install the required libraries. you need to install:

cudf and dask-cudf (install 23.02 version)
Install the 23.06 releases for the below Merlin libs:
nvtabular
merlin systems
merlin models
merlin Transformers4rec
merlin dataloader
Pytorch-gpu

otherwise you cannot run the examples on GPU.

Are you able to use examples on CPU currently? If you want to run on CPU, you dont need to install cudf and dask-cudf and change the dataloader in the examples to pyarrow and no_cuda to True. like here:

https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/main/tests/unit/torch/test_trainer.py#L74-L76

ghisloine · 2023-09-20T12:48:54Z

After your first suggestion, i am trying to use merlin-pytorch:23.06 image but i receive error like ImportError: libcuda.so.1: cannot open shared object file: No such file or directory . I am a bit confused now because when i try to check pip list, all packages are here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PicklingError #740

PicklingError #740

ghisloine commented Aug 9, 2023

suyee97 commented Sep 13, 2023

rnyak commented Sep 13, 2023 •

edited

Loading

ghisloine commented Sep 13, 2023

suyee97 commented Sep 14, 2023

dcy0577 commented Sep 15, 2023 •

edited

Loading

rnyak commented Sep 17, 2023

ghisloine commented Sep 20, 2023

PicklingError #740

PicklingError #740

Comments

ghisloine commented Aug 9, 2023

suyee97 commented Sep 13, 2023

rnyak commented Sep 13, 2023 • edited Loading

ghisloine commented Sep 13, 2023

suyee97 commented Sep 14, 2023

dcy0577 commented Sep 15, 2023 • edited Loading

rnyak commented Sep 17, 2023

ghisloine commented Sep 20, 2023

rnyak commented Sep 13, 2023 •

edited

Loading

dcy0577 commented Sep 15, 2023 •

edited

Loading