-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: keep defer property under nsqd restart #1137
nsqd: keep defer property under nsqd restart #1137
Conversation
/cc @ploxiln @mreiferson |
0303512
to
dcf2a50
Compare
6e5e029
to
43879e2
Compare
I like the idea. This is a bit complicated, so it'll be a bit of time before I have some good review comments. Some quick ideas I'll be thinking about:
|
ping #871 |
If we're going to implement this, similar to @ploxiln's early feedback, my preference is to thoughtfully decide on a new on-disk format that provides us some future flexibility (in a backwards compatible way) that will also make it possible to address persistence of deferred metadata. The dream would also be to use the same format over the wire as we do on disk, but that's going to be a much more difficult compatibility problem. |
I also have thought this. But it turns out that it may be difficult for backward compatibility. Some properties do not need both for consumer and disk persistence. For example, just like defer time, consumer does not need to know this property. Make disk structure and wire structure the same also raise the need to change the sdk. It is a nightmare to add such change to all sdk especially lots of NSQ sdk are not under the community governance.
Agreed. |
What the nsq tcp protocol does right now is:
It works but doesn't seem quite appropriate for this use - I think we want something a bit more general. Json is a common arbitrary choice for a misc general extensible use, but I think we want something more space and computation efficient, for putting lots of small messages on disk. The classic thing to do here is type-length-value, which I'll call key-length-value because it may be more clear. For misc properties/tags similar to this, it may be sufficient to do one byte key, one byte length, length bytes value. The "key" is from a defined enum. So maybe "defer_until" is key=1, and "attempts" is key=2, and (hypothetical) "last_attempt_ip" is key=3. So this allows up to 255 different keys to be defined (maybe 0 is special), and values up to 255 bytes. For "defer_until" you would do len=8 and value={uint64}. Even though you sometimes know the length based just on key, you always include length, so an older version of the program that does not know this key can skip over it. (Also, you could encode int values smaller if possible - "attempts" could be 1 or 2 bytes as needed.) If 255 seems like not much room to grow, we could do 2-byte key and 2-byte length. We could start with 4 or 8 bytes "magic" value, and end with type=0 length=0, and after that comes the message body. (I guess we could either move timestamp, attempts, message-id to the key-length-value for new-format-magic backend-blobs, or we could leave those properties in the classic format at the beginning of the body, since they are also sent to the consumer in that way.) So, that's the way embedded developers stored config settings, back when they had just 4KB eeprom for the purpose ... just an idea for an aggressively efficient format. I had a thought about the magic value - right now, all serialized message blobs start with 8-byte nanosecond timestamp. We can come up with a value that would never be or have been there, or even just 4 bytes that would never be at the beginning of a big-endian 8-byte nanosecond timestamp.
So if the 4-byte magic was something like [0x08 0xc8 0x04 0xc4] then that would be sometime early 1990, which would not occur for any message generated with a reasonable clock, or a clock reset to the manufacture time of the server (usually 2000+), or a clock reset to zero, etc. |
Agreed, I don't think this PR needs to have any bearing on the wire format to land.
+1 was thinking the same.
Given the introduction of a magic value, It seems unlikely that we'll ever need more than 255 attributes before we would just move to a different structure altogether.
For internal consistency of the new format, let's start with this and see how it turns out.
+1 good idea |
@ploxiln @mreiferson IMHO, we should first make a draft about the disk format redesign. It is quite a big change since it is a backend storage refactor. As for defer property loss is somewhat simple problem compared storage refactor, i suggest we should first fix the problem and start to do the storage refactor ASAP. |
I know we should do some refactor to make NSQD storage contain more info and be more feasible for future changes, but it is not reasonable for discussing and doing the refactor in this PR, IMO. |
@ploxiln @mreiferson Fired a new issue for tracking the storage redesign in #1139 |
Are you suggesting we land this PR as is? I don't want to change disk formats twice, so I think this change should wait until we complete #1139. |
Yes. disk formats change is a long planning. This defer bug is fatal since without this change defer functionality does not meet the description. I think it is more urgent to first fix this bug. |
We don't think this is urgent, though. This has been a known limitation for multiple years. We tell people who ask that nsqd is not really a scheduler, and they need to put a timestamp in the message so the consumer can check if it is time to process yet, and if not requeue with delay equal to the remaining time. |
I've started on a branch with the new type-length-value serialization for the "backend" disk queue. |
43879e2
to
8520ed7
Compare
This is not feasible. We should use json or msgpack to pack all message meta data. This makes adding or deleting a new metadata field more easily in a backward compatibility way. We can offload this to json marshal/unmarshal or msgpack marshal/unmarshal instead of hard code metadata parse logic. msgpack is more preferred since it is more fast and small. |
So the message storage will contain three stops:
|
Closing in favor of continuing the discussion in #1170 |
Currently, deferred message defer time duration does not persist when nsqd do a graceful shutdown or serialized to disk. This PR will persist deferred message with its defer time duration. This PR needs to keep backward compatibility.