-
Hello all, thank you for this great application. It seems that it is exactly what I searched for for a long time. Nevertheless, I have a question: I need to implement a (SQL-like) Full Outer Join of multiple streams in my Flink Job. I've seen that there is a node called "union-memo", which is like an Inner Join. Second, there is a "single side join" that could be combined with a union node, if I got the functionality right. Is there a node that delivers a Full Outer Join? Otherwise, can I use the single side join and a union node to get a full outer join in the end? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
Hi @phisinger Thank you for warm word. Feature that you are writing about (Full Outer Join component) looks like something that would be a nice complementation of set of stateful components thats are available in Nussknacker distribution. Currently component that works quite similar is union-memo. The only difference that I see is that union-memo has state which is "sustained" by inputs from each sides. It means that for sequence:
and window (state timeout) with span 2 minutes , after event4 will be collected event with values:
For Outer Join you rather expect that B will null in this case. Am I right that this difference is crucial for your usage and union-memo won't be a good replacement for that? On the other hand, single side join is a "real" outer join but asymmetric. You can try to make something like two splits for each sources and after that two single side joins but with reversed MAIN branchType and then normal union. You can also try to write your own component based on single side join and union-memo implementations. If you have basic knowledge of Flink, it won't be hard. The implementation of components is represented as Flink's DSL. We have a Developer Guide: https://nussknacker.io/documentation/docs/developers_guide/Basics (which is still in WIP phase, but we can help you if you ocurres some troubles using our components API). We also prepared projects showing how to write your own sample component from scratch: https://github.com/TouK/nussknacker-sample-components Let me know if I help you. |
Beta Was this translation helpful? Give feedback.
-
"union-memo" like just "union" ensures that for every incoming message is always generated exact one outgoing message. So outgoing messages will look like:
Time of emitting messages is the same as time of incoming messages. This component is rather simple - it doesn't wait in some window, it just "keep state" for time of this window but react on every event instantly. It is the thing that probably be changed or configurable in some more high level component like "Full outer join" because it could cause some run condition in case of desynchronized branches. Summarizing - this component is "fast". The problem is if it is not "too fast" for your cases? |
Beta Was this translation helpful? Give feedback.
-
Hi @phisinger . I'm back after a longer delay. Have you find a work around for your issue? Meantime we've developed full outer join component which will work (I hope) as you expected. You can read about its in our development version of documentation: https://nussknacker.io/documentation/docs/next/scenarios_authoring/AggregatesInTimeWindows/#full-outer-join . Implementation and test cases you can find in MR: https://github.com/TouK/nussknacker/pull/3075/files . This change will be released in 1.5 version. We have a plan to release it around 2022-08-16. We will be glad to hear if this component fulfill your requirements. |
Beta Was this translation helpful? Give feedback.
"union-memo" like just "union" ensures that for every incoming message is always generated exact one outgoing message. So outgoing messages will look like:
A: event1(key=X), B : null
A: null, B: event2(key=X)
A: event3(key=Y), B: null
A: event4(key=Z), B: null
The "other side" is always null in the example above because delay between messages with the…