Replies: 4 comments 3 replies
-
Makes sense to not support/use both versions of thrift and consolidate. |
Beta Was this translation helpful? Give feedback.
-
+1. We have faced thrift/fbthrift conflict in the past as well with Prestissimo. See prestodb/presto#20340 and prestodb/presto#18261. |
Beta Was this translation helpful? Give feedback.
-
@pedroerp can we please get some performance validation for Parquet Reader with this switch and ensure there are no surprises? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Just checking, was there a conclusion here? It seems like the discussion fizzled out and the PR to make the change went stale #6523 |
Beta Was this translation helpful? Give feedback.
-
We currently have a dependency on thrift from our native parquet reader. As we add support for remote function execution, the client <-> server communication will be done using thrift RPC as well.
FBThrift is a newer and supposedly more efficient version than thrift, and among other things provides an async transport called "cpp2" (not available in thrift), which will be used for remote function execution.
Linking thrift and fbthrift to the same binary causes symbols to conflict and end up causing crashes as the wrong version of symbols eventually get picked. There was a long discussion on Slack (check #builds-and-ci channel) on whether this could be solved using dynamic linking, but so solution was found.
With this context, the proposal is to remove the thrift dependency from Velox (created due to parquet), and consolidate all of that on FBThrift. Despite their divergence, both projects seem to support TCompactProtocol (the ones used by Parquet), so, in theory (need some testing to validate), they should be compatible. FBThrift is also claimed to be more efficient, so there might be small performance gains in doing so.
The parquet generated cpp files are also today pre-compiled and checked in as part of the codebase; as part of this change we would also move them to be compiled at build time, like how it is done for remote functions. I've tested and all of them seems to be doable.
There is also a thrift dependency that comes with Arrow (we use arrow's parquet writer), but this is nicely wrapped within arrow and doesn't leak its internal symbols.
What do you all think?
UPDATE: just doubled checked with someone from the fbthrift team and the TCompactProtocol serialization needed for Parquet should be compatible.
Cc: @assignUser @majetideepak @aditi-pandit @yingsu00 @kgpai @mbasmanova @xiaoxmeng @oerling
Beta Was this translation helpful? Give feedback.
All reactions