Small library to read serialized protobuf(s) directly into Pandas DataFrame.
This is intended to be a simple shortcut for translating serialized protobuf bytes / files directly to a dataframe.
Available via pip:
$ pip install read-protobuf
Run the demo-notebook for an interactive demo.
import demo_pb2 # compiled protobuf message module
from read_protobuf import read_protobuf
MessageType = demo_pb2.MessageType() # instantiate a new message type
df = read_protobuf(b'\x00\x00', MessageType) # create a dataframe from serialized protobuf bytes
df = read_protobuf([b'\x00\x00', b'x00\x00'] MessageType) # read multiple protobuf bytes
df = read_protobuf('demo.pb', MessageType) # use file instead of bytes
df = read_protobuf(['demo.pb', 'demo2.pb'], MessageType) # read multiple files
# options
df = read_protobuf('demo.pb', MessageType, flatten=False) # don't flatten pb messages
df = read_protobuf('demo.pb', MessageType, prefix_nested=True) # prefix nested messages with parent keys (like pandas.io.json.json_normalize)
To compile a protobuf Message class from python, use:
$ protoc --python_out="." demo.proto
https://github.com/benhodgson/protobuf-to-dict
This library was developed earlier to convert protobufs to JSON via a dict.
The google protobuf library comes with utilities to convert messages to a dict
or JSON,
then loaded by Pandas.
from google.protobuf.json_format import MessageToJson
from google.protobuf.json_format import MessageToDict
In brief tests, the read_protobuf
package is about 2x as fast
as using MessageToDict
and 3x as fast as MessageToJson
.
To install a development version of the package, run from the root directory:
$ pip install -e .
- To install development dependencies, use the optional
[dev]
dependencies:
$ pip install -e ".[dev]"
Uses black
and isort
to format files.
$ make black
$ make isort
Uses ruff
to lint application.
$ make ruff
Uses pytest
to run unit tests. From the root of the repository, run:
$ make pytest
# specify test
$ pytest -k "TestRead::test_read_bytes"
Use coverage
to monitor code coverage during tests.
To record coverage while running tests, run:
$ make pytest-cov