-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a zero-copy write protocol #347
Comments
We are looking for something like this |
@eolivelli can you remove 'replace google protobuf' from caption? this is going to be misleading. I don't see replacing google protobuf is going to happen any time soon, because the fact the protocol has been run on thousands of machines. changing that requires a long discussion and it will be a long journey to get there. And protobuf3 at least started offering unsafe operations to wrap bytebuffer to bytestring. we also need to evaluate this method as well. |
this is going to a big change. can we first unmark the milestone? before we come up with a method, let's not mark it in any release. |
@eolivelli My preference would be to keep the current Protobuf wire protocol and use a zero-copy implementation to serialize/deserialize. As I mentioned some time back in one of the calls, I have a half-backed protobuf zero-copy library that I've been playing around. I should find some time to get back to it :). If anyone is interested I can share the code as well. |
@merlimat it would be great to be compatibile on the wire protocol as @sijie told it seems that ProtoBuf3 supports usage of ByteBuffers I am interested in this topic, If you have time to share some code I will appreciate that |
@sijie you already started pushing changes in this direction (see the upgrade to Protobuf 3.4), do you want to assign this issue to yourself ? |
@eolivelli I will leave it here unassigned. due to some limitations in protobuf 3, we can improve the situation (e.g. avoid memory copy for ByteString), however there are multiple places that we still copy memory. It is a bit hard to address at this moment. |
@sijie got it, clear.thanks |
I think the first change for this would be to move the payload out of the protobuf itself. This would remove the bulk of the need to copy. Another aspect of this would be removing the GC-churn caused by protobuf itself, but I think that can be handled separately. For that we could take the protobuf stuff done for pulsar as @merlimat said. Even if it's half baked, it works very well. |
proto3 supports UnsafeByteOperations for ByteString to use an existing bytebuffer without memory copying. we don't need to move payload out of the protobuf, we just need to leverage this feature. |
Nice, I didn't know that existed. Would be nice to have the recycler stuff in 3 so we're not allocating a ByteString object each time also. |
UnsafeByteOperations helps with memory allocations (I prototyped this at Salesforce's internal fork and it did help). Cap'n'proto or flatbuffers might work for this: https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html |
@dlg99 do you mind pushing your prototype out? |
@sijie it is in internal repo and done on old fork. cannot really merge to community code / have to redo it from scratch. |
@dlg99 okay. gotcha. |
Great discussion, but I am not sure where we are headed.
We really need to improve this and I would love everyone to switch to V3 protocol. It's a small community, and having two protocols, neither will get enough testing/attention. This is clearly evident with the latest refcount bug(s) we ran into. |
@jvrao Just a note on V2 vs V3, it seems we're discussing multiple disjoint things. When I mentioned proto3 above, I meant protoc 3.0, i.e. version 3 of the compiler which can generate code from protobuf2 and protobuf3 format. I would expect that we would stay on protobuf2 format, for the core protocol, as protobuf3 has a lot of changes which we don't necessarily need. protobuf2 isn't deprecated in any way, and it will coexist with protobuf3 for the foreseeable future. They just target different usecases. Now, we are already on protoc 3.0 for bookkeeper, but pulsar, uses a modified version of protoc 2.6, which reduces the number of allocations. I would like to upgrade this modified version of protoc to a 3.x version, so that we can use it in bookkeeper (we need 3.0 for grpc stuff). This is separate from what we call V2 and V3 protocols within bookkeeper. It would be good to kill V2, but we cannot do that until we do a lot of other stuff to bring V3 up to scratch. |
@karanmehta93 was asking about protobuf copies in this slack thread and in his PR #2441. |
FEATURE REQUEST
In the V3 protocol we use Google Protobuf, which is great for portability but it forces to perform copies of the payloads of the messages.
The V2 protocol is more direct and requires no copy, but it is all hardcoded in Java/Bookeeper code and it is difficult to maitain.
We would like to have a good IDL-supported tool for writing the wire protocol with the ability to leverage power of Netty 4 ByteBuf or at least NIO ByteBuffers
should-have
The text was updated successfully, but these errors were encountered: