Skip to content

Commit

Permalink
Fixed issue with horovod and early stopping (#986)
Browse files Browse the repository at this point in the history
* Fixed issue with horovod and early stopping

* applied black

Co-authored-by: Arnaud Wéry <[email protected]>
  • Loading branch information
awery and Arnaud Wéry authored Dec 5, 2022
1 parent a62b886 commit de084e9
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions opennmt/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,15 @@ def is_master(self):
def num_replicas(self):
return self._hvd.size()

def _evaluate(self, evaluator, step, moving_average=None):
should_stop = super()._evaluate(evaluator, step, moving_average)
# Evaluation is only performed on master, but we want all workers
# to be aware of the early stopping decision.
should_stop = self._hvd.broadcast_object(
should_stop, root_rank=0, name="should_stop"
)
return should_stop

def _finalize_dataset(self, dataset):
if callable(dataset):
dataset = dataset(
Expand Down

0 comments on commit de084e9

Please sign in to comment.