nsqd: keep defer property under nsqd restart #1137

andyxning · 2019-02-21T04:30:56Z

Currently, deferred message defer time duration does not persist when nsqd do a graceful shutdown or serialized to disk. This PR will persist deferred message with its defer time duration. This PR needs to keep backward compatibility.

andyxning · 2019-02-21T04:31:22Z

/cc @ploxiln @mreiferson

ploxiln · 2019-02-21T17:35:39Z

I like the idea. This is a bit complicated, so it'll be a bit of time before I have some good review comments. Some quick ideas I'll be thinking about:

can we switch to storing only the absolute defer-to time in the message struct
can we use a magic prefix for normal messages too, so in the future there is no possible ambiguity (even though there is still ambiguity for messages already in the disk queue from current/older nsqd)
can magic prefix be just 4 or 8 bytes, but very unlikely binary values

mreiferson · 2019-02-21T18:10:06Z

ping #871

mreiferson · 2019-02-21T18:13:08Z

If we're going to implement this, similar to @ploxiln's early feedback, my preference is to thoughtfully decide on a new on-disk format that provides us some future flexibility (in a backwards compatible way) that will also make it possible to address persistence of deferred metadata.

The dream would also be to use the same format over the wire as we do on disk, but that's going to be a much more difficult compatibility problem.

andyxning · 2019-02-22T02:12:10Z

The dream would also be to use the same format over the wire as we do on disk, but that's going to be a much more difficult compatibility problem.

I also have thought this. But it turns out that it may be difficult for backward compatibility. Some properties do not need both for consumer and disk persistence. For example, just like defer time, consumer does not need to know this property. Make disk structure and wire structure the same also raise the need to change the sdk. It is a nightmare to add such change to all sdk especially lots of NSQ sdk are not under the community governance.

my preference is to thoughtfully decide on a new on-disk format that provides us some future flexibility (in a backwards compatible way) that will also make it possible to address persistence of deferred metadata.

Agreed.

andyxning · 2019-02-22T06:31:05Z

@mreiferson

ploxiln · 2019-02-22T07:42:28Z

What the nsq tcp protocol does right now is:

from client to server: all-caps COMMAND, followed by optional space-separated args, followed by newline, followed by 4 bytes int body length, followed by body bytes.
from server to client: 4 bytes int size, 4 bytes int type, body bytes

It works but doesn't seem quite appropriate for this use - I think we want something a bit more general. Json is a common arbitrary choice for a misc general extensible use, but I think we want something more space and computation efficient, for putting lots of small messages on disk.

The classic thing to do here is type-length-value, which I'll call key-length-value because it may be more clear. For misc properties/tags similar to this, it may be sufficient to do one byte key, one byte length, length bytes value. The "key" is from a defined enum. So maybe "defer_until" is key=1, and "attempts" is key=2, and (hypothetical) "last_attempt_ip" is key=3. So this allows up to 255 different keys to be defined (maybe 0 is special), and values up to 255 bytes. For "defer_until" you would do len=8 and value={uint64}. Even though you sometimes know the length based just on key, you always include length, so an older version of the program that does not know this key can skip over it. (Also, you could encode int values smaller if possible - "attempts" could be 1 or 2 bytes as needed.) If 255 seems like not much room to grow, we could do 2-byte key and 2-byte length. We could start with 4 or 8 bytes "magic" value, and end with type=0 length=0, and after that comes the message body. (I guess we could either move timestamp, attempts, message-id to the key-length-value for new-format-magic backend-blobs, or we could leave those properties in the classic format at the beginning of the body, since they are also sent to the consumer in that way.)

So, that's the way embedded developers stored config settings, back when they had just 4KB eeprom for the purpose ... just an idea for an aggressively efficient format.

I had a thought about the magic value - right now, all serialized message blobs start with 8-byte nanosecond timestamp. We can come up with a value that would never be or have been there, or even just 4 bytes that would never be at the beginning of a big-endian 8-byte nanosecond timestamp.

>>> "0x%016x" % (int(datetime(1990,1,1).timestamp()) * 10**9)
'0x08c25dc55ae22000'

So if the 4-byte magic was something like [0x08 0xc8 0x04 0xc4] then that would be sometime early 1990, which would not occur for any message generated with a reasonable clock, or a clock reset to the manufacture time of the server (usually 2000+), or a clock reset to zero, etc.

mreiferson · 2019-02-22T17:32:32Z

It works but doesn't seem quite appropriate for this use

Agreed, I don't think this PR needs to have any bearing on the wire format to land.

The classic thing to do here is type-length-value, which I'll call key-length-value because it may be more clear.

+1 was thinking the same.

If 255 seems like not much room to grow, we could do 2-byte key and 2-byte length.

Given the introduction of a magic value, It seems unlikely that we'll ever need more than 255 attributes before we would just move to a different structure altogether.

I guess we could either move timestamp, attempts, message-id to the key-length-value for new-format-magic backend-blobs

For internal consistency of the new format, let's start with this and see how it turns out.

I had a thought about the magic value - right now, all serialized message blobs start with 8-byte nanosecond timestamp. We can come up with a value that would never be or have been there, or even just 4 bytes that would never be at the beginning of a big-endian 8-byte nanosecond timestamp.

+1 good idea

andyxning · 2019-02-25T07:29:36Z

@ploxiln @mreiferson IMHO, we should first make a draft about the disk format redesign. It is quite a big change since it is a backend storage refactor.

As for defer property loss is somewhat simple problem compared storage refactor, i suggest we should first fix the problem and start to do the storage refactor ASAP.

andyxning · 2019-02-25T07:33:15Z

I know we should do some refactor to make NSQD storage contain more info and be more feasible for future changes, but it is not reasonable for discussing and doing the refactor in this PR, IMO.

andyxning · 2019-02-25T07:35:42Z

@ploxiln @mreiferson Fired a new issue for tracking the storage redesign in #1139

mreiferson · 2019-02-25T15:25:40Z

As for defer property loss is somewhat simple problem compared storage refactor, i suggest we should first fix the problem and start to do the storage refactor ASAP.

Are you suggesting we land this PR as is? I don't want to change disk formats twice, so I think this change should wait until we complete #1139.

andyxning · 2019-02-26T01:52:50Z

Are you suggesting we land this PR as is?

Yes. disk formats change is a long planning. This defer bug is fatal since without this change defer functionality does not meet the description. I think it is more urgent to first fix this bug.

ploxiln · 2019-02-26T03:46:37Z

We don't think this is urgent, though. This has been a known limitation for multiple years. We tell people who ask that nsqd is not really a scheduler, and they need to put a timestamp in the message so the consumer can check if it is time to process yet, and if not requeue with delay equal to the remaining time.

ploxiln · 2019-02-28T06:17:42Z

I've started on a branch with the new type-length-value serialization for the "backend" disk queue.

andyxning · 2019-07-26T04:31:41Z

The classic thing to do here is type-length-value, which I'll call key-length-value because it may be more clear. For misc properties/tags similar to this, it may be sufficient to do one byte key, one byte length, length bytes value. The "key" is from a defined enum. So maybe "defer_until" is key=1, and "attempts" is key=2, and (hypothetical) "last_attempt_ip" is key=3. So this allows up to 255 different keys to be defined (maybe 0 is special), and values up to 255 bytes. For "defer_until" you would do len=8 and value={uint64}. Even though you sometimes know the length based just on key, you always include length, so an older version of the program that does not know this key can skip over it. (Also, you could encode int values smaller if possible - "attempts" could be 1 or 2 bytes as needed.) If 255 seems like not much room to grow, we could do 2-byte key and 2-byte length. We could start with 4 or 8 bytes "magic" value, and end with type=0 length=0, and after that comes the message body. (I guess we could either move timestamp, attempts, message-id to the key-length-value for new-format-magic backend-blobs, or we could leave those properties in the classic format at the beginning of the body, since they are also sent to the consumer in that way.)

This is not feasible. We should use json or msgpack to pack all message meta data. This makes adding or deleting a new metadata field more easily in a backward compatibility way. We can offload this to json marshal/unmarshal or msgpack marshal/unmarshal instead of hard code metadata parse logic.

msgpack is more preferred since it is more fast and small.

andyxning · 2019-07-26T04:36:52Z

So the message storage will contain three stops:

magic bytes
metadata
data

mreiferson · 2020-06-14T01:34:48Z

Closing in favor of continuing the discussion in #1170

andyxning force-pushed the keep_defer_property_under_nsqd_restart branch 5 times, most recently from 0303512 to dcf2a50 Compare February 21, 2019 08:49

andyxning changed the title ~~keep defer property under nsqd restart~~ nsqd: keep defer property under nsqd restart Feb 21, 2019

andyxning force-pushed the keep_defer_property_under_nsqd_restart branch 2 times, most recently from 6e5e029 to 43879e2 Compare February 21, 2019 09:52

mreiferson added the feature label Feb 21, 2019

andyxning mentioned this pull request Feb 22, 2019

nsqd: add max_channel_client_connection_count #1136

Merged

andyxning mentioned this pull request Feb 25, 2019

nsqd: refactor NSQD storage backend format #1139

Open

keep defer property under nsqd restart

8520ed7

andyxning force-pushed the keep_defer_property_under_nsqd_restart branch from 43879e2 to 8520ed7 Compare March 8, 2019 08:11

andyxning mentioned this pull request Jul 28, 2019

nsqd: refactor storage engine #1170

Open

mreiferson closed this Jun 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsqd: keep defer property under nsqd restart #1137

nsqd: keep defer property under nsqd restart #1137

andyxning commented Feb 21, 2019

andyxning commented Feb 21, 2019

ploxiln commented Feb 21, 2019

mreiferson commented Feb 21, 2019

mreiferson commented Feb 21, 2019

andyxning commented Feb 22, 2019 •

edited

Loading

andyxning commented Feb 22, 2019

ploxiln commented Feb 22, 2019

mreiferson commented Feb 22, 2019

andyxning commented Feb 25, 2019

andyxning commented Feb 25, 2019

andyxning commented Feb 25, 2019

mreiferson commented Feb 25, 2019

andyxning commented Feb 26, 2019

ploxiln commented Feb 26, 2019

ploxiln commented Feb 28, 2019

andyxning commented Jul 26, 2019 •

edited

Loading

andyxning commented Jul 26, 2019

mreiferson commented Jun 14, 2020

nsqd: keep defer property under nsqd restart #1137

nsqd: keep defer property under nsqd restart #1137

Conversation

andyxning commented Feb 21, 2019

andyxning commented Feb 21, 2019

ploxiln commented Feb 21, 2019

mreiferson commented Feb 21, 2019

mreiferson commented Feb 21, 2019

andyxning commented Feb 22, 2019 • edited Loading

andyxning commented Feb 22, 2019

ploxiln commented Feb 22, 2019

mreiferson commented Feb 22, 2019

andyxning commented Feb 25, 2019

andyxning commented Feb 25, 2019

andyxning commented Feb 25, 2019

mreiferson commented Feb 25, 2019

andyxning commented Feb 26, 2019

ploxiln commented Feb 26, 2019

ploxiln commented Feb 28, 2019

andyxning commented Jul 26, 2019 • edited Loading

andyxning commented Jul 26, 2019

mreiferson commented Jun 14, 2020

andyxning commented Feb 22, 2019 •

edited

Loading

andyxning commented Jul 26, 2019 •

edited

Loading