-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialization is extremely slow for messages with many small sub-messages. #179
Comments
From what I understand, this looks like the In that case, maybe something more lightweight than a Dict will help here. The count and names of fields is known during code generation, so one way could be to generate specialized code for each struct instead of using Dict. |
Yes, the bottleneck is due to the dict that is used to represent the protobuf stuct. So there is not really a quick-fix for this because it would require a fundamentally different way of generating the code. I created a MWE that also includes some results in the profile_results subdirectory. I'm not sure what else would be a good way to share more information on this because the flame graph is not really helpful unless you interact with it (i.e. hover etc.). I think you can ignore the large chunk on the bottom right in this chart because it is just the overhead that comes from running things in the REPL. The actual workload is the part on the left. |
Hi guys, wondering what the patterns in protobuf.jl is to deserialise multiple sub messages in the one message? (not sure if this i should open a separate issue) I'm seeing that this is not officially part of the protobuf spec https://developers.google.com/protocol-buffers/docs/techniques and couldn't see it in any of the Readme's. |
We have the same issue with OpenStreetMapX: pszufe/OpenStreetMapX.jl#44 (comment)
Is there any technical issue blocking the code of the struct to be generated without using Dict but fields instead? |
Technically it should be possible. The point is just that one would have to re-write the entire code-generation logic. |
Indeed. It's best to try it out by manually doing it for one package. I tried it by manually changing the generated code of OpenStreetMapX, see here for the proof of concept: pszufe/OpenStreetMapX.jl#52 |
Hi all, we're seeing the same thing here: allocating a small Dict per message is prohibitively expensive. We remembered that the last time we used ProtoBuf.jl, a few years ago (for JuliaPerf/PProf.jl@d6d525c), it generated a We found that this change from struct to Dict was made as part of #141, which fixed a thread-safety issue regarding the use of @tanmaykm can you elaborate on how we came to the current design with a Dict in each object? Why did we change away from generating a struct per message? We'd be interested to help with implementing any future work here to improve the performance, and we'd like to help get rid of this allocation. 😊 Thanks so much! :) CC: @dewilson |
For this example proto, here's what it looks like before and after #141 syntax = "proto3";
message plant {
string name = 3;
}
message seed {
int32 cold_hours = 1;
int32 germination_hours = 2;
plant species = 3;
} Here's the mutable struct seed <: ProtoType
__protobuf_jl_internal_meta::ProtoMeta
__protobuf_jl_internal_values::Dict{Symbol,Any}
function seed(; kwargs...)
obj = new(meta(seed), Dict{Symbol,Any}())
values = obj.__protobuf_jl_internal_values
symdict = obj.__protobuf_jl_internal_meta.symdict
for nv in kwargs
fldname, fldval = nv
fldtype = symdict[fldname].jtyp
(fldname in keys(symdict)) || error(string(typeof(obj), " has no field with name ", fldname))
values[fldname] = isa(fldval, fldtype) ? fldval : convert(fldtype, fldval)
end
obj
end
end # mutable struct seed
const __meta_seed = Ref{ProtoMeta}()
function meta(::Type{seed})
ProtoBuf.metalock() do
if !isassigned(__meta_seed)
__meta_seed[] = target = ProtoMeta(seed)
allflds = Pair{Symbol,Union{Type,String}}[:cold_hours => Int32, :germination_hours => Int32, :species => plant]
meta(target, seed, allflds, ProtoBuf.DEF_REQ, ProtoBuf.DEF_FNUM, ProtoBuf.DEF_VAL, ProtoBuf.DEF_PACK, ProtoBuf.DEF_WTYPES, ProtoBuf.DEF_ONEOFS, ProtoBuf.DEF_ONEOF_NAMES)
end
__meta_seed[]
end
end
function Base.getproperty(obj::seed, name::Symbol)
if name === :cold_hours
return (obj.__protobuf_jl_internal_values[name])::Int32
elseif name === :germination_hours
return (obj.__protobuf_jl_internal_values[name])::Int32
elseif name === :species
return (obj.__protobuf_jl_internal_values[name])::plant
else
getfield(obj, name)
end
end And before: mutable struct plant <: ProtoType
name::AbstractString
plant(; kwargs...) = (o=new(); fillunset(o); isempty(kwargs) || ProtoBuf._protobuild(o, kwargs); o)
end #mutable struct plant
const __fnum_plant = Int[3]
meta(t::Type{plant}) = meta(t, ProtoBuf.DEF_REQ, __fnum_plant, ProtoBuf.DEF_VAL, true, ProtoBuf.DEF_PACK, ProtoBuf.DEF_WTYPES, ProtoBuf.DEF_ONEOFS, ProtoBuf.DEF_ONEOF_NAMES, ProtoBuf.DEF_FIELD_TYPES)
mutable struct seed <: ProtoType
cold_hours::Int32
germination_hours::Int32
species::plant
seed(; kwargs...) = (o=new(); fillunset(o); isempty(kwargs) || ProtoBuf._protobuild(o, kwargs); o)
end #mutable struct seed |
:) We're wondering the same thing! I think it probably is related to trying to support protobuf reflection in a good way, or may have been related to supporting mutually recursive message definitions. Whatever the reason, we'd be happy to help with some group engineering to get to a solution we all love! 😊 |
I was hoping that some sort of lightweight dict could be used here, but that has not happened yet unfortunately. @NHDaly has rightly pointed out reasons for switching to a dict based representation.
It will sure be great to have folks contribute to this! |
Thanks for the explanation, @tanmaykm! :) Makes sense. Okay, I think we should be able to achieve those goals without having dictionaries in the structs. The proto implementations in other languages have all found ways to work around these, so I think we should be able to do so as well. For the recursive struct definitions one: So for example, given this proto definition: message RecurseA {
required string a = 1;
optional RecurseB b = 2;
}
message RecurseB {
required string b = 1;
optional RecurseA a = 2;
} You could generate julia code like this: struct _RecurseA{__RecurseB}
a::AbstractString
b::__RecurseB
end
struct RecurseB
b::AbstractString
a::_RecurseA{RecurseB}
end
const RecurseA = _RecurseA{RecurseB} I think we can work through the remaining issues, and go back to doing code generation while still avoiding the thread safety and recursion issues you had before! :) Thanks Tanmay. |
For the field name issue, we now have this syntax:
|
Hey folks, I encourage you to take the new version (1.0) of this package for a spin, should be a bit faster. Please see the docs for some guidance on how to migrate to this new version. Closing for now as this issue is a little all over to place, but please, feel free to reopen (or better, start a new issue) if you feel the performance is still bad. |
FWIW, I returned to this recently to re-assess the feasibility of using ProtoBuf for another project. The MWE from above now only takes about 4ms on my machine --- a solid 30x improvement. Thanks, @Drvi and team! |
First of all, thank you for making ProtoBuf available in Julia! I am currently using this package to read scenarios form the waymo-open-dataset. While this has worked reliably, I am running into quite some performance issues due the "topology" of the messages.
Some relevant details:
Each of the scenarios consists of approximately 2MB of binary data. However, each scenario contains map features which in turn consist of some meta-data and a number of MapPoints. Each map point only encodes three floats for the x, y, and z position. Thus they are quite small. However, a single scenario can contain > 100.000 MapPoints (spread over the different features of the map).
As a result, loading a single scenario takes about 150ms on my machine. This may not sound all too bad but in my current usecase I need to iterate over about 400.000 of these scenarios; potentially multiple times.
After doing some profiling, I found that 60% of the time is spent pre-allocating memory for all the small dicts that are needed for each
MapPoint
object. Specifically, most of the time is spent in this line of julia base.Just to provide some comparison: The same kind of data is loaded in about 11ms from a json file, and in about 3ms from the binary format created by
Serialization.serialize
. Of course, there are other issues with these kinds of formats. Therefore, I would really like to use ProtoBuf here. But I cannot get it up to speed.Are there known ways to reduce the amount of overhead that arises in this setting with many small proto sub-messages? Or is this issue inherent to the current design of the library?
The text was updated successfully, but these errors were encountered: