fix(train.py): mfu estimation to respect CPU-GPU sync point #527

JasonLiJT · 2024-06-23T16:13:45Z

Previously, the mfu timing measurement was taken before the CPU-GPU sync point at every iter. The resulting running_mfu:

would converge correctly when log_interval = 1.
could converge to > 100% when log_interval > 1.
- This could create the illusion that bumping log_interval speeds up training (it usually does not).

See diagrams below.

`log_interval` = 1

`log_interval` = 2

Note that t3 - t2 is discarded. Only t2 - t1 and t4 - t3 etc contribute to running_mfu.

Previously, the mfu timing measurement was taken before the CPU-GPU sync point at every iter. The resulting `running_mfu`: - would converge correctly when `log_interval = 1`. - could converge to > 100% when `log_interval > 1`.

fix(train.py): mfu estimation to respect CPU-GPU sync point

f295b54

Previously, the mfu timing measurement was taken before the CPU-GPU sync point at every iter. The resulting `running_mfu`: - would converge correctly when `log_interval = 1`. - could converge to > 100% when `log_interval > 1`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(train.py): mfu estimation to respect CPU-GPU sync point #527

fix(train.py): mfu estimation to respect CPU-GPU sync point #527

JasonLiJT commented Jun 23, 2024 •

edited

Loading

fix(train.py): mfu estimation to respect CPU-GPU sync point #527

Are you sure you want to change the base?

fix(train.py): mfu estimation to respect CPU-GPU sync point #527

Conversation

JasonLiJT commented Jun 23, 2024 • edited Loading

log_interval = 1

log_interval = 2

JasonLiJT commented Jun 23, 2024 •

edited

Loading

`log_interval` = 1

`log_interval` = 2