What tools do you use to store data data for distributed training #620

GoingMyWay · 2022-07-07T02:07:29Z

Hi, I found you use reverb to store data. For distributed training, is reverb fast? For example, how much does time cost to get a batch from the remote replay buffer by using reverb?

AsadJeewa · 2022-07-07T10:32:59Z

Hi @GoingMyWay We have been using Reverb as a replay buffer from the inception of MAVA so I cannot comment on alternatives. We also do not have a remote server i.e. we use localhost in MAVA.

In general, the performance can be improved: we did investigate/ benchmark how fast Reverb was for our use case (google-deepmind/reverb#94), but this was more so for adding data to the server, not sampling (which has never been flagged/ slowed down our executors in the past). The tradeoff is that it is a robust repo that serves its purpose nicely.

Hope this helps

arnupretorius closed this as completed Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What tools do you use to store data data for distributed training #620

What tools do you use to store data data for distributed training #620

GoingMyWay commented Jul 7, 2022

AsadJeewa commented Jul 7, 2022 •

edited

Loading

What tools do you use to store data data for distributed training #620

What tools do you use to store data data for distributed training #620

Comments

GoingMyWay commented Jul 7, 2022

AsadJeewa commented Jul 7, 2022 • edited Loading

AsadJeewa commented Jul 7, 2022 •

edited

Loading