You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But in the case of skewed data, there's currently no way to set number of output shards--it's always hardcoded to 1.
We'd have to do some thinking on how to implement this -- since all shards would be written from the same worker core, we'd have to implement a round-robin type algorithm to assign shards, and maintain numShards open file handles while writing.
The text was updated successfully, but these errors were encountered:
Number of output buckets in SMB transform can be configured using TargetParallelism:
But in the case of skewed data, there's currently no way to set number of output shards--it's always hardcoded to 1.
We'd have to do some thinking on how to implement this -- since all shards would be written from the same worker core, we'd have to implement a round-robin type algorithm to assign shards, and maintain
numShards
open file handles while writing.The text was updated successfully, but these errors were encountered: