Replies: 4 comments
-
I generally agree that this is a weird limitation that does not really translate into a good user experience. Ideally, I'd also like to move forward in the direction you suggest. However, after giving it some thought, I think there is an issue that needs careful consideration: Right now, the If all queues go out of scope, and you then create a new one, you'll get an error that MPI cannot be initialized again, after having been finalized already. That's not ideal, and we should try and emit a more clear error message. That being said, I think moving to a void do_something() { celerity::distr_queue queue; queue.submit(/**/); }
void do_something_else() { celerity::distr_queue queue; queue.submit(/**/); }
void main() {
do_something();
do_something_else();
} Right now this is not possible, and our answer as to why is "well, you are only allowed to copy the queue". After this change, the code above MIGHT be allowed, but only if a third As a possible solution, we could move away from the whole queue lifetime equals runtime lifetime concept. This would however mean that users have to explicitly synchronize at the end of every program -- not very nice. Also, while we do have We could also make it actually re-initialize a new runtime instance. While that is of course possible (and that's actually what we're already doing for unit tests), I don't think this is what a user typically would want (especially since there is some overhead involved), and I'd like to see a legitimate use case for it first. The only real solution that I can come up with right now is to have a parameter for cc @PeterTh Implementation wise, since the |
Beta Was this translation helpful? Give feedback.
-
In terms of implementation, do you think it would be possible to go about it by adding a layer of indirection? I.e., we'd have one user-facing level with |
Beta Was this translation helpful? Give feedback.
-
I feel like there are at least three perspectives on this, and the requirements / design constraints are somewhat different for each of them. From the perspective of keeping simple things simple, and it being easy to teach a basic "Hello World" example of Celerity usage, the behavior of having a single queue with user-managed lifetime, RAII, and automatic synchronization is useful. It lets you write an example with simple host/client interaction without too much mental overhead. From the perspective of a user with a large-scale program who wants to introduce Celerity piece-by-piece, having free access to "the" queue at arbitrary points seems like a useful feature simplifying Celerity usage. However, I wonder how valuable it is in practice: as soon as you want to transfer any GPU-side data between individual phases of your program, or re-use buffers, you'll probably need some abstraction for Celerity-relevant data that you pass around anyway. Since that is probably crucial in most programs for decent performance, at that point you could just as well include a reference to your queue in that data as well. (At least that has been my experience with porting one larger-scale program so far) From the perspective of API "purity", and keeping the potential for lifetime errors to a minimum, even allowing the queue to be copied is already problematic. (That is not to say that people won't get lifetime issues even without copy-able queues; we need to make the compiler plugin standard and extend its coverage to really help out there) I think the idea of having separate user-facing queues which are independent of the (singleton) "real" distributed queue might be a viable way to try and bridge these perspectives. However, that's a pretty significant change, and we should make sure that it's really solving issues without introducing new ones (or greatly increasing the complexity of either the implementation or mental model without sufficient payoff). |
Beta Was this translation helpful? Give feedback.
-
I'm generally not loving the implications of the queue being copyable either (to be honest I don't think I thought it fully through back when I added that functionality). I would suggest that we don't move forward with this until we have a clearer understanding of the trade-offs and what it is that we actually want to achieve with this API. We know at least from the large-ish program that @PeterTh talked about, that passing around a Celerity context (that references the queue) can be a viable solution. As we are preparing to port another, larger program right now, I'll be interested to see how it pans out there. One thing I think we absolutely have to ensure is that the out-of-the-box experience for new users is as seamless as possible. So we should definitely avoid surprising/hidden semantics regarding synchronization. @n-io Could you maybe outline the use case that led you to opening this issue, so we can get a better understanding? As a side note, during internal discussions of this issue we also encountered the question of multi-threading: I want to emphasize here that right now, we assume all work is being submitted to the queue from a single thread. At first glance, multiple "detached" queues might seem like a good way to introduce multi-threading capabilities to the Celerity API. However, we still have our strict SPMD/pre-pass/control-flow requirements, which I think might pan out to be really awkward when dealing with multiple threads. |
Beta Was this translation helpful? Give feedback.
-
Celerity only allows one instance of the
celerity::distr_queue
class, as captured by the error:Only one celerity::distr_queue can be created per process (but it can be copied!)
As the error message suggests, work-arounds do exist, but they require additional work from the user for managing and copying queue instances. Doing this extra work comes at effectively no gain to the user. For instance, manually managing queues does make sense if multiple queues are possible and the user needs to know which ones they are dealing with. However, with only one queue instance permitted it would simplify the API and user code if this instance was managed by celerity.
Please consider pushing the work for managing the queue instance into celerity, exposing a singleton to the user.
Beta Was this translation helpful? Give feedback.
All reactions