-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broadway.prepare_messages/2 callback always receives exactly one message #131
Comments
I am almost sure the RabbitMQ producer is push based, so it is receiving the messages from amqp connection. So perhaps it is something that needs to be configured there to receive bigger batches? |
Thank you for the very prompt reply, Jose! My understanding is that this is controlled by the |
@jvoegele
So, you'll see >1 messages only if your processors/batchers start processing messages slower, which will result in messages queueing up in the producer (because of demand and all the GenStage stuff) and then being delivered at once. Does that make sense? |
Thank you, @whatyouhide, that does make sense. Is there anything in the Also, is this behavior consistent with other Broadway producers, or is this a specific "quirk" in the Thank you for your help. |
This is not really generic, it's more specific to how AMQP (the lib) works and how the Broadway RabbitMQ producer works, as you mentioned later. But any improvement to the doc is super appreciated, PRs are more than welcome 🙃 |
I'm not sure how I didn't come across this before, but I've been struggling with this as well! I sent a PR but now that I see this it's clear that I didn't really clear it up 😬. Has anyone figured out a workaround? @LostKobrakai suggested here that it might be a matter of adding an aggregator in between? Perhaps a custom producer that uses this producer but pools? Another option would be to add a size + timeout set of options here to accumulate batches. Would you be open to a PR that does this? The behaviour would be identical to the batcher behaviour in Broadway (send the messages once either N messages have accumulated or the timeout has been reached). The default could be 1 message and 0ms timeout to match current behaviour. |
I have pushed more docs to this.
First we need to discuss why you need this feature. It feels you want |
For example, even if you tell the producer to batch, if may create a batch of 4, but then you have four processors, then each of them will still receive a single message. We really can't batch effectively at the same time we fan-out. |
This makes sense, but it's because I need to hit the database to prepare messages. Each message is an identifier and I need to fetch data from both elasticsearch and postgres before doing work in
So I think I'm understanding it as you described: 'a convenience for looking all messages at once as part of the fan-out stage of your pipeline', but the behaviour today (one message at a time no matter what) precludes that.
This would be fine! 100% expected. Right now I have one processor and no matter what min/max demand I set, no matter what happens, I see one message at a time in |
It's actually causing ack timeout problems for me that I've had to work around. Because the docs don't really reflect this:
I've been trying to increase I could try to just use a different producer like SQS, but right now we're not running this on AWS and I'd love to keep it that way. What you suggested at the top of the thread:
Makes sense! And the docs say this is possible, but then it seems from the line @whatyouhide referenced above that these opts are basically no-ops? |
Qos controls what happens over the network, but not at the message level. In any case it is clear this is a documentation issue. Prepare messages is a best effort optimization and, in case of rabbit, it does not happen. You need to move the logic to handle_batch. |
Another approach is for you to introduce an intermediate process, called by handle_message, that does very simple batching before reaching out to elastic search or the database. |
If I do that then I lose the ability to direct to different batchers based on information in
I'll try this approach. Similar to Thanks for the help! |
Yes, it should be relatively straight-forward. You will keep all requests in a list and have a counter.
A bonus would be to start a task to process the batch and have the task send the replies back to everyone waiting. |
I will close this, as I pushed docs to broadway. :) |
Thanks again! The above will definitely work for us, nearly got it going already :). 100% a documentation issue. |
The Broadway docs for the
Broadway.prepare_messages/2
callback state:However, in our application using
BroadwayRabbitMQ.Producer
, it seems as if the length of the list of messages is always exactly one regardless of themin_demand
/max_demand
configuration in the processor. Furthermore, it doesn't seem to matter if there are only a few messages in the RabbitMQ queue being used, or if there there are very many messages in the queue. Whenever IIO.inspect(length(messages), label: "prepare_messages batch size")
in myprepare_messages
callback, it always printsprepare_messages batch size: 1
.Is this a bug in
BroadwayRabbitMQ.Producer
? Or have I misconfigured something? Or is there something else happening here that I'm not correctly understanding?For reference, here is the Broadway topology for the pipeline in question:
And below is the (elided) configuration being passed to
Broadway.start_link
. Note that:qos
and most other optional configs are not being overridden, and that the processor is configured withmin_demand: 10, max_demand: 20
, which should result in a batch of 10 messages being passed toprepare_messages
.The text was updated successfully, but these errors were encountered: