Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: keep defer property under nsqd restart #1137

Conversation

andyxning
Copy link
Member

Currently, deferred message defer time duration does not persist when nsqd do a graceful shutdown or serialized to disk. This PR will persist deferred message with its defer time duration. This PR needs to keep backward compatibility.

@andyxning
Copy link
Member Author

/cc @ploxiln @mreiferson

@andyxning andyxning force-pushed the keep_defer_property_under_nsqd_restart branch 5 times, most recently from 0303512 to dcf2a50 Compare February 21, 2019 08:49
@andyxning andyxning changed the title keep defer property under nsqd restart nsqd: keep defer property under nsqd restart Feb 21, 2019
@andyxning andyxning force-pushed the keep_defer_property_under_nsqd_restart branch 2 times, most recently from 6e5e029 to 43879e2 Compare February 21, 2019 09:52
@ploxiln
Copy link
Member

ploxiln commented Feb 21, 2019

I like the idea. This is a bit complicated, so it'll be a bit of time before I have some good review comments. Some quick ideas I'll be thinking about:

  • can we switch to storing only the absolute defer-to time in the message struct
  • can we use a magic prefix for normal messages too, so in the future there is no possible ambiguity (even though there is still ambiguity for messages already in the disk queue from current/older nsqd)
  • can magic prefix be just 4 or 8 bytes, but very unlikely binary values

@mreiferson
Copy link
Member

ping #871

@mreiferson
Copy link
Member

If we're going to implement this, similar to @ploxiln's early feedback, my preference is to thoughtfully decide on a new on-disk format that provides us some future flexibility (in a backwards compatible way) that will also make it possible to address persistence of deferred metadata.

The dream would also be to use the same format over the wire as we do on disk, but that's going to be a much more difficult compatibility problem.

@andyxning
Copy link
Member Author

andyxning commented Feb 22, 2019

The dream would also be to use the same format over the wire as we do on disk, but that's going to be a much more difficult compatibility problem.

I also have thought this. But it turns out that it may be difficult for backward compatibility. Some properties do not need both for consumer and disk persistence. For example, just like defer time, consumer does not need to know this property. Make disk structure and wire structure the same also raise the need to change the sdk. It is a nightmare to add such change to all sdk especially lots of NSQ sdk are not under the community governance.

my preference is to thoughtfully decide on a new on-disk format that provides us some future flexibility (in a backwards compatible way) that will also make it possible to address persistence of deferred metadata.

Agreed.

@andyxning
Copy link
Member Author

@mreiferson

@ploxiln
Copy link
Member

ploxiln commented Feb 22, 2019

What the nsq tcp protocol does right now is:

  • from client to server: all-caps COMMAND, followed by optional space-separated args, followed by newline, followed by 4 bytes int body length, followed by body bytes.
  • from server to client: 4 bytes int size, 4 bytes int type, body bytes

It works but doesn't seem quite appropriate for this use - I think we want something a bit more general. Json is a common arbitrary choice for a misc general extensible use, but I think we want something more space and computation efficient, for putting lots of small messages on disk.

The classic thing to do here is type-length-value, which I'll call key-length-value because it may be more clear. For misc properties/tags similar to this, it may be sufficient to do one byte key, one byte length, length bytes value. The "key" is from a defined enum. So maybe "defer_until" is key=1, and "attempts" is key=2, and (hypothetical) "last_attempt_ip" is key=3. So this allows up to 255 different keys to be defined (maybe 0 is special), and values up to 255 bytes. For "defer_until" you would do len=8 and value={uint64}. Even though you sometimes know the length based just on key, you always include length, so an older version of the program that does not know this key can skip over it. (Also, you could encode int values smaller if possible - "attempts" could be 1 or 2 bytes as needed.) If 255 seems like not much room to grow, we could do 2-byte key and 2-byte length. We could start with 4 or 8 bytes "magic" value, and end with type=0 length=0, and after that comes the message body. (I guess we could either move timestamp, attempts, message-id to the key-length-value for new-format-magic backend-blobs, or we could leave those properties in the classic format at the beginning of the body, since they are also sent to the consumer in that way.)

So, that's the way embedded developers stored config settings, back when they had just 4KB eeprom for the purpose ... just an idea for an aggressively efficient format.


I had a thought about the magic value - right now, all serialized message blobs start with 8-byte nanosecond timestamp. We can come up with a value that would never be or have been there, or even just 4 bytes that would never be at the beginning of a big-endian 8-byte nanosecond timestamp.

>>> "0x%016x" % (int(datetime(1990,1,1).timestamp()) * 10**9)
'0x08c25dc55ae22000'

So if the 4-byte magic was something like [0x08 0xc8 0x04 0xc4] then that would be sometime early 1990, which would not occur for any message generated with a reasonable clock, or a clock reset to the manufacture time of the server (usually 2000+), or a clock reset to zero, etc.

@mreiferson
Copy link
Member

It works but doesn't seem quite appropriate for this use

Agreed, I don't think this PR needs to have any bearing on the wire format to land.

The classic thing to do here is type-length-value, which I'll call key-length-value because it may be more clear.

+1 was thinking the same.

If 255 seems like not much room to grow, we could do 2-byte key and 2-byte length.

Given the introduction of a magic value, It seems unlikely that we'll ever need more than 255 attributes before we would just move to a different structure altogether.

I guess we could either move timestamp, attempts, message-id to the key-length-value for new-format-magic backend-blobs

For internal consistency of the new format, let's start with this and see how it turns out.

I had a thought about the magic value - right now, all serialized message blobs start with 8-byte nanosecond timestamp. We can come up with a value that would never be or have been there, or even just 4 bytes that would never be at the beginning of a big-endian 8-byte nanosecond timestamp.

+1 good idea

@andyxning
Copy link
Member Author

@ploxiln @mreiferson IMHO, we should first make a draft about the disk format redesign. It is quite a big change since it is a backend storage refactor.

As for defer property loss is somewhat simple problem compared storage refactor, i suggest we should first fix the problem and start to do the storage refactor ASAP.

@andyxning
Copy link
Member Author

I know we should do some refactor to make NSQD storage contain more info and be more feasible for future changes, but it is not reasonable for discussing and doing the refactor in this PR, IMO.

@andyxning
Copy link
Member Author

@ploxiln @mreiferson Fired a new issue for tracking the storage redesign in #1139

@mreiferson
Copy link
Member

As for defer property loss is somewhat simple problem compared storage refactor, i suggest we should first fix the problem and start to do the storage refactor ASAP.

Are you suggesting we land this PR as is? I don't want to change disk formats twice, so I think this change should wait until we complete #1139.

@andyxning
Copy link
Member Author

Are you suggesting we land this PR as is?

Yes. disk formats change is a long planning. This defer bug is fatal since without this change defer functionality does not meet the description. I think it is more urgent to first fix this bug.

@ploxiln
Copy link
Member

ploxiln commented Feb 26, 2019

We don't think this is urgent, though. This has been a known limitation for multiple years. We tell people who ask that nsqd is not really a scheduler, and they need to put a timestamp in the message so the consumer can check if it is time to process yet, and if not requeue with delay equal to the remaining time.

@ploxiln
Copy link
Member

ploxiln commented Feb 28, 2019

I've started on a branch with the new type-length-value serialization for the "backend" disk queue.

@andyxning andyxning force-pushed the keep_defer_property_under_nsqd_restart branch from 43879e2 to 8520ed7 Compare March 8, 2019 08:11
@andyxning
Copy link
Member Author

andyxning commented Jul 26, 2019

The classic thing to do here is type-length-value, which I'll call key-length-value because it may be more clear. For misc properties/tags similar to this, it may be sufficient to do one byte key, one byte length, length bytes value. The "key" is from a defined enum. So maybe "defer_until" is key=1, and "attempts" is key=2, and (hypothetical) "last_attempt_ip" is key=3. So this allows up to 255 different keys to be defined (maybe 0 is special), and values up to 255 bytes. For "defer_until" you would do len=8 and value={uint64}. Even though you sometimes know the length based just on key, you always include length, so an older version of the program that does not know this key can skip over it. (Also, you could encode int values smaller if possible - "attempts" could be 1 or 2 bytes as needed.) If 255 seems like not much room to grow, we could do 2-byte key and 2-byte length. We could start with 4 or 8 bytes "magic" value, and end with type=0 length=0, and after that comes the message body. (I guess we could either move timestamp, attempts, message-id to the key-length-value for new-format-magic backend-blobs, or we could leave those properties in the classic format at the beginning of the body, since they are also sent to the consumer in that way.)

This is not feasible. We should use json or msgpack to pack all message meta data. This makes adding or deleting a new metadata field more easily in a backward compatibility way. We can offload this to json marshal/unmarshal or msgpack marshal/unmarshal instead of hard code metadata parse logic.

msgpack is more preferred since it is more fast and small.

@andyxning
Copy link
Member Author

So the message storage will contain three stops:

  • magic bytes
  • metadata
  • data

@mreiferson
Copy link
Member

Closing in favor of continuing the discussion in #1170

@mreiferson mreiferson closed this Jun 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants