Skip to content

Latest commit

 

History

History
59 lines (46 loc) · 2.27 KB

README.md

File metadata and controls

59 lines (46 loc) · 2.27 KB

pbview

Read protobuf messages without deserialization

If you only need to read a few fields of large Google Protocol Buffer messages you can use this library to do this much faster than with the original google/protobuf parser.

The pbview compiler (pbviewc) generates classes that have the same (read-only) interface as the C++ classes generated by the protobuf compiler (protoc).

Usage

Generate classes the same way, that you generate the regular protobuf c++ messages:

$ pbviewc --cpp_out=out_dir --proto_path=in_dir mymessage.proto

Access binary message fields directly inplace:

#include <mymessage.pbview.h>

MyMessage msg;
msg.set_id(42);
auto binStr = msg.SerializeAsString();
auto view = pbview::View<MyMessage>::fromBytesString(binStr);
REQUIRE(view.id() == 42);

Requirements

  • C++17 compiler
  • google/protobuf
  • range-v3

Features

  • The parser seekes fast to the requested fields. Large strings and even sub-messages are skipped in one step
  • Working with serialized messages has significant lower memory consumptions than holding deserialized messages in memory
  • No memory allocations (std::string_view directly pointing into the serialized message, instead of std::string)
  • Variant types that contain either a binary view or a google::protobuf::Message

Drawbacks

Please note, that every field access requires a (partial) parsing of the containing message.

This means, that there is always a break-even point at which the deserialization of the message is faster. (Hybrid approaches are possible: It is easy to deserialize only single sub-messages of larger structs with this library.)

Benchmark your exact use-cases and than make a well-founded decision!

TODO

  • Compatibility with proto3 syntax
  • Reflection+Descriptor interface
  • libfuzzer + asan tests
  • Evaluate caching strategies
    • value or offset of already accessed fields
    • id of first field in each cache-line (for binary search)
    • offset of each field (like done by Cap’n Proto or FlatBuffers)
  • Support ZeroCopyInputStreams instead of only flat memory data (for compressed data)
  • Support uncanonically serialized messages
    • fields not ordered by field number (untested)
    • repeated fields marked as packed but serialized without packing

License

Boost Software License 1.0