wip.txt

Chapter 6


On Thu, Mar 10, 2011 at 7:09 PM, Michael Andersen <michael@ibab.dk> wrote:

> Given an active ZMQ_PUSH socket with a number of ZMQ_PULL worker threads
> connected - is there a way to:
> 1) Determine how many workers are currently ready. To be more specific:
> how many connected ZMQ_PULL threads are currently connected and blocked
> in recv() ?

No.

> 2) Determine how many messages are currently pushed on the socket,
> queued up, waiting to be recv()'ed by an available worker ?


i use the scheme of having the workers report status back via a PUSH-PULL to the
source (or control process). then pump down no-op data until all workers report no-ops.
- Hide quoted text -

When streaming a fixed quantity of data through a pipeline via PUSH-PULL
sockets, I find myself wanting to signal all of the downstream workers
when the input data has been exhausted.  To do that, I have to send my
"finished" message multiple times, which means that each upstream PUSHer
must be explicitly configured to know how many downstream PULLers there
will be, which spreads-around knowledge of the pipeline topology and
introduces lots of complexity, e.g. waiting for every PULLer to connect,
etc.  Is there an easier way to "broadcast" a message to every PULL
socket connected to a PUSH?  Using a separate PUB-SUB connection seems
equally problematic - I'm assuming that there's no guarantee that a
published "finished" message would arrive after the last data message in
the PUSH-PULL connection.

Thanks in advance,
Tim


++++ Reliable Pipeline (Harmony Pattern)

0MQ's pipeline pattern (using PUSH and PULL sockets) is reliable to the extent that:

* Workers and collectors don't crash;
* Workers and collectors read their data fast enough to avoid queue overflows.

As with all our reliability patterns, we'll ignore what happens if an upstream node (the ventilator for a pipeline pattern) dies. In practice a ventilator will be the client of another reliability pattern, e.g. Clone.

The Harmony pattern takes pipeline and makes it robust against the only failure we can reasonably handle, namely workers and (less commonly) collectors that crash and lose messages or work.

- assume workers are idempotent
- assume batch size is known in advance (because...)
- assume memory enough to hold full batch
- batch: start (address of collector), tasks, end
- messages numbered 0 upwards inside batch
- assume multiple ventilators for same cluster
- assume collector talks to ventilator, (not same to allow walk-up-and use by ventilators)
- call ventilator the 'client'
- if task missing, resend
- if end of batch missing, resend from last response

++ How to deliver jobs one by one using push/pull? i.e. ensure jobs don't get lost...

- Majordomo?