Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a zero-copy write protocol #347

Open
eolivelli opened this issue Aug 1, 2017 · 19 comments
Open

Provide a zero-copy write protocol #347

eolivelli opened this issue Aug 1, 2017 · 19 comments

Comments

@eolivelli
Copy link
Contributor

FEATURE REQUEST

  1. Please describe the feature you are requesting.
    In the V3 protocol we use Google Protobuf, which is great for portability but it forces to perform copies of the payloads of the messages.
    The V2 protocol is more direct and requires no copy, but it is all hardcoded in Java/Bookeeper code and it is difficult to maitain.

We would like to have a good IDL-supported tool for writing the wire protocol with the ability to leverage power of Netty 4 ByteBuf or at least NIO ByteBuffers

  1. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

should-have

@eolivelli
Copy link
Contributor Author

We are looking for something like this
https://dwrensha.github.io/capnproto-java/index.html

/cc @merlimat @revans2

@sijie
Copy link
Member

sijie commented Aug 1, 2017

@eolivelli can you remove 'replace google protobuf' from caption? this is going to be misleading. I don't see replacing google protobuf is going to happen any time soon, because the fact the protocol has been run on thousands of machines. changing that requires a long discussion and it will be a long journey to get there.

And protobuf3 at least started offering unsafe operations to wrap bytebuffer to bytestring. we also need to evaluate this method as well.

@sijie
Copy link
Member

sijie commented Aug 1, 2017

this is going to a big change. can we first unmark the milestone? before we come up with a method, let's not mark it in any release.

@merlimat
Copy link
Contributor

merlimat commented Aug 1, 2017

@eolivelli My preference would be to keep the current Protobuf wire protocol and use a zero-copy implementation to serialize/deserialize.

As I mentioned some time back in one of the calls, I have a half-backed protobuf zero-copy library that I've been playing around. I should find some time to get back to it :).

If anyone is interested I can share the code as well.

@eolivelli eolivelli changed the title Provide a zero-copy write protocol (replace Google ProtoBuf ?) Provide a zero-copy write protocol Aug 2, 2017
@eolivelli eolivelli removed this from the 4.6.0 milestone Aug 2, 2017
@eolivelli
Copy link
Contributor Author

@merlimat it would be great to be compatibile on the wire protocol

as @sijie told it seems that ProtoBuf3 supports usage of ByteBuffers
see https://github.com/google/protobuf/releases

I am interested in this topic, If you have time to share some code I will appreciate that
thanks

@eolivelli
Copy link
Contributor Author

@sijie you already started pushing changes in this direction (see the upgrade to Protobuf 3.4), do you want to assign this issue to yourself ?

@sijie
Copy link
Member

sijie commented Sep 4, 2017

@eolivelli I will leave it here unassigned. due to some limitations in protobuf 3, we can improve the situation (e.g. avoid memory copy for ByteString), however there are multiple places that we still copy memory. It is a bit hard to address at this moment.

@eolivelli
Copy link
Contributor Author

@sijie got it, clear.thanks

@ivankelly
Copy link
Contributor

I think the first change for this would be to move the payload out of the protobuf itself. This would remove the bulk of the need to copy.

Another aspect of this would be removing the GC-churn caused by protobuf itself, but I think that can be handled separately. For that we could take the protobuf stuff done for pulsar as @merlimat said. Even if it's half baked, it works very well.

@ivankelly ivankelly added the triage/18-w11 Issue triage, 2018 week 11 label Mar 12, 2018
@sijie
Copy link
Member

sijie commented Mar 12, 2018

@ivankelly

I think the first change for this would be to move the payload out of the protobuf itself.

proto3 supports UnsafeByteOperations for ByteString to use an existing bytebuffer without memory copying. we don't need to move payload out of the protobuf, we just need to leverage this feature.

@ivankelly
Copy link
Contributor

Nice, I didn't know that existed. Would be nice to have the recycler stuff in 3 so we're not allocating a ByteString object each time also.

@dlg99
Copy link
Contributor

dlg99 commented Mar 14, 2018

UnsafeByteOperations helps with memory allocations (I prototyped this at Salesforce's internal fork and it did help).
It does not help with zero copy. Netty added splice support netty/netty#3665 and Java's NIO has transferTo/transferFrom that allow to bypass data copying to user space. As I remember, Kafka uses these.
IIRC zero copy is impossible with protobuf as it needs to bring data to user space for parsing. protocolbuffers/protobuf#1896

Cap'n'proto or flatbuffers might work for this: https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html

@sijie
Copy link
Member

sijie commented Mar 14, 2018

UnsafeByteOperations helps with memory allocations (I prototyped this at Salesforce's internal fork and it did help).

@dlg99 do you mind pushing your prototype out?

@sijie
Copy link
Member

sijie commented Mar 14, 2018

fyi protocolbuffers/protobuf#3296

@dlg99
Copy link
Contributor

dlg99 commented Mar 14, 2018

@sijie it is in internal repo and done on old fork. cannot really merge to community code / have to redo it from scratch.

@sijie
Copy link
Member

sijie commented Mar 14, 2018

@dlg99 okay. gotcha.

@jvrao
Copy link
Contributor

jvrao commented Mar 14, 2018

Great discussion, but I am not sure where we are headed.

  1. What kind of improvements is possible with ProtoBufs?
    a. Move to V3 and use unsafe operations. As per @dlg99 this saves memory allocations but not buffer copy.
    b. As @dlg99 suggested we can implement splice, does anyone have a good grasp of what it can save? Do we approach near to zero-copy?
    c. Should we consider separating the payload as @ivankelly suggested?

  2. With the combination of a/b/c can we live with Probuff v3 or should we seriously consider FlatBuffs?

  3. Even with FlatBuffs it is not a slamdunk and it's going to be a huge change. Is it really worth?

We really need to improve this and I would love everyone to switch to V3 protocol. It's a small community, and having two protocols, neither will get enough testing/attention. This is clearly evident with the latest refcount bug(s) we ran into.

@ivankelly
Copy link
Contributor

ivankelly commented Mar 14, 2018

@jvrao Just a note on V2 vs V3, it seems we're discussing multiple disjoint things.

When I mentioned proto3 above, I meant protoc 3.0, i.e. version 3 of the compiler which can generate code from protobuf2 and protobuf3 format. I would expect that we would stay on protobuf2 format, for the core protocol, as protobuf3 has a lot of changes which we don't necessarily need. protobuf2 isn't deprecated in any way, and it will coexist with protobuf3 for the foreseeable future. They just target different usecases.

Now, we are already on protoc 3.0 for bookkeeper, but pulsar, uses a modified version of protoc 2.6, which reduces the number of allocations. I would like to upgrade this modified version of protoc to a 3.x version, so that we can use it in bookkeeper (we need 3.0 for grpc stuff).

This is separate from what we call V2 and V3 protocols within bookkeeper. It would be good to kill V2, but we cannot do that until we do a lot of other stuff to bring V3 up to scratch.

@Ghatage
Copy link
Contributor

Ghatage commented Oct 17, 2020

@karanmehta93 was asking about protobuf copies in this slack thread and in his PR #2441.
Looking forward to his update on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants