-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flowbits unreliable across threads #71
Comments
Looking at the code, I believe this is due to the fact that there is no guarantee on order for processing flowbits. I believe there needs to be a data structure that tracks current record timestamps and guarantees that an isset check cannot occur until there are no older records that have not matched a rule yet or, if there is a rule that has been matched and is still being processed, that rule has already set its flowbit(s) or cannot set a flowbit the current record wishes to check. |
Hello, Thank you for the bug report. It appears (?) that you are attempting to use flowbits and read in a file. "[*] EOF reached. Waiting for threads to catch up...." I've never tested flowbits on a file. When using a FIFO, we are receiving the events "in order". That is, the input times ranges. Sagan is consuming the file all at once. For example: In the FIFO world, I might have two events span several seconds. In the file world, then data is read in bulk. There is not "time" between events. I could certainly see threads in a "race" condition. In reality, the "-F" (file in) option needs to be completely rethought out. The reading in of a file was added as a later after thought. You point is certainly taken recording of time stamps for verification. This would likely event have some benefit when receiving data via FIFO. There actually is a data structure which keeps track of flowbits: See the _Sagan_IPC_Flowbit structure. Even better, we have a little "hidden" utility we use for debugging. In the Sagan source tree, go under "tools". You'll see a program called "sagan-peek.c". Run "make" in that directory. This allows you to dump the contents of the mmap structures. It will show the values of the flowbits, It displays the "state" ("S"), "Flowbit name", "SRC IP", "DST IP", "Date added/modified" and the "expire time". Rather than using a file, you might want to see if the same thing happens via a FIFO. I still think that adding time based verification is a good idea. |
So, using a file was done to reproduce the issue. The same thing occurs if using a FIFO being written to by rsyslog. |
Also of note, the timestamps in the records i listed are the actual timestamps from the events. Both occurred within the same second, which is why this is probably exhibiting in production as well. If you have enough time between events then the race becomes a non-issue, but for things like DHCP where the events could occur sub-second, it does present an issue. |
I can certainly see it being an issue via FIFO. If the log is sent to Sagan within a very short amount of time, a race condition occurs. We might be able to do time comparisons but some code would need to change. Sagan would need to store the reception of the log at a more granular level (microsecond?). Let me leave this open and ponder on it a bit. |
We might look into this in 1.1.2, but could also be further out. Need to investigate what it would take to make flowbits "atomic"ish. |
Just a quick update. We have Sagan time resolution at the 1000th of a second. We still need to do some more work to get this into xbits (formerly known as "flowbits"). This may help this issue. Thread sequence is still an issue. |
Great news, thanks for following up on this. |
I am attempting to write a rule that detects DHCPACK in response to a DHCPREQUEST using flowbits. I am finding that the result is non-deterministic. Sometimes the flowbit from the REQUEST is set before the ACK is processed, and sometimes it is not.
Here are the rules:
Here are sample log entries:
Here are is a run where it fails to match the flow bit
Here is a run where it does match:
The text was updated successfully, but these errors were encountered: