Implementing Tests for Distributed Ops #22

EiffL · 2019-11-09T19:07:15Z

We currently have tests to evaluate the TensorFlow version of FlowPM against our reference FastPM simulation code. They run automatically on Travis CI.
The problem now is that we want to be doing the same thing but with the Mesh implementation, which requires running the ops on a TensorFlow cluster.

We need to figure out how to run those tests automatically. The most likely answer will be to spawn a TF cluster on Travis, with like 4 CPU processes, and run the tests by connecting to this local cluster.

There are some caveats with that approach though, because as we have already seen, some ops behave differently on CPU and GPU, so...

EiffL · 2020-08-13T14:42:47Z

This is high priority @modichirag I'm assigning myself, but also feel free to think about this

EiffL added enhancement New feature or request Mesh TensorFlow labels Nov 9, 2019

EiffL self-assigned this Aug 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing Tests for Distributed Ops #22

Implementing Tests for Distributed Ops #22

EiffL commented Nov 9, 2019

EiffL commented Aug 13, 2020

Implementing Tests for Distributed Ops #22

Implementing Tests for Distributed Ops #22

Comments

EiffL commented Nov 9, 2019

EiffL commented Aug 13, 2020