Slow NER training with GPU - SpaCy v3.2 #9876

faouinti · 2021-12-16T11:17:29Z

faouinti
Dec 16, 2021

Hi,

I am using GPU to train a NER model from Scratch in SpaCy v3.2 (with the --gpu-id option) and SLURM job scheduler:

sbatch -p gpu --gres = gpu: v100: 1 my_script.sh

Here is the "my_script.sh" submission script:

#! / bin / bash
python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./train.spacy --gpu-id 0

When I use nvidia-smi, I can clearly see GPU usage at 7% with memory usage at 0% (slow). That's why, I think that adjustments on my side are to be made at the level of the SpaCy to optimize its use.

Could you please tell me where this slow training with GPU comes from?

Thanks in advance,
FA

polm · 2021-12-20T05:57:05Z

polm
Dec 20, 2021

What makes you say your training is slow? Is it faster in another context? You say that memory usage is low and therefore it's slow but I don't understand the connection.

Also, how big is your training set, your GPU, your RAM...? It's not clear if SLURM is relevant here or not either.

0 replies

faouinti · 2021-12-20T08:35:06Z

faouinti
Dec 20, 2021
Author

Thank you for your reply.

My training dataset contains 574 files (short texts), I was able to retrieve an accounting info on the submitted job (on GPU v100) which ran ~ 11 minutes for 7 epochs (~ 24 minutes on CPU). Is this normal?

I think that if we have a ratio of less than 4, we can talk about slowness.

5 replies

polm Dec 20, 2021

How long is a short text?

GPU speedup for non-Transformer models is a thing, but it's not dramatic. Are you using a Transformer?

Your training time doesn't sound particularly weird to me. Maybe try the example projects and see how they compare?

faouinti Dec 20, 2021
Author

The mean length of the contents is around 2000 characters (space included).

I just selected GPU (transformer) for Hardware when creating the Config file for training the NER model from scratch. To get better use of GPU, is it necessary to use Transformer? Also, how to specify the number of epochs in SpaCy v3.2?

Thank you.

polm Dec 21, 2021

Transformers have higher accuracy than other models, but they are very slow unless you use a GPU.

Based on your description here, it sounds like things are working as expected. I'm still not sure why you think this is slow.

Epochs and patience settings are are specified in the training config.

faouinti Dec 21, 2021
Author

Thank you for the clarification.

lashmore Sep 13, 2024

@polm, are there more statistics like the one you gave us above? I have a 7,000 document training dataset and a 3,000 document dev dataset, and it takes roughly 45 minutes for every 200 steps (STEPS not epochs). My GPU utilization is about 2000/24000 as well, so low. It is very hard to debug when training time is soooo slow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow NER training with GPU - SpaCy v3.2 #9876

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Slow NER training with GPU - SpaCy v3.2 #9876

faouinti Dec 16, 2021

Replies: 2 comments · 5 replies

polm Dec 20, 2021

faouinti Dec 20, 2021 Author

polm Dec 20, 2021

faouinti Dec 20, 2021 Author

polm Dec 21, 2021

faouinti Dec 21, 2021 Author

lashmore Sep 13, 2024

faouinti
Dec 16, 2021

Replies: 2 comments 5 replies

polm
Dec 20, 2021

faouinti
Dec 20, 2021
Author

faouinti Dec 20, 2021
Author

faouinti Dec 21, 2021
Author