You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS: CentOS Linux release 7.4.1708 (Core)
spark3D: 0.1.4
spark-fits: 0.6.0
#72 adds a script to benchmark the partitioning. The idea is the following:
Load data using spark-fits (10 millions)
Apply partitioning or not to the RDD
Trigger an action, and repeat this several times (put in cache data at the first time)
Regardless the partitioning (octree or onion), the GC time is rather big compared to the compute time:
Octree (mapPartitions at Shape3DRDD.scala:164):
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
48 s
48 s
48 s
48 s
48 s
GC Time
33 s
33 s
33 s
33 s
33 s
Onion (mapPartitions at Shape3DRDD.scala:164)
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
46 s
46 s
46 s
46 s
46 s
GC Time
28 s
28 s
28 s
28 s
28 s
The code responsible of this is (Shape3DRDD.scala:142)
/** * Repartion a RDD[T] according to a custom partitioner. * * @paramrdd : (RDD[T]) * RDD of T (must extends Shape3D) with any partitioning. * @parampartitioner : (SpatialPartitioner) * Instance of SpatialPartitioner or any extension of it. * @return (RDD[T]) Repartitioned RDD[T]. **/defpartition(partitioner: SpatialPartitioner)(implicitc: ClassTag[T]) :RDD[T] = {
// Go from RDD[V] to RDD[(K, V)] where K is specified by the partitioner.// Finally, return only RDD[V] with the new partitioning.defmapElements(iter: Iterator[T]) :Iterator[(Int, T)] = {
varres=ListBuffer[(Int, T)]()
while (iter.hasNext) {
res ++= partitioner.placeObject(iter.next).toList
}
res.iterator
}
rawRDD.mapPartitions(mapElements).partitionBy(partitioner).mapPartitions(_.map(_._2), true)
}
We must investigate this.
The text was updated successfully, but these errors were encountered:
OS: CentOS Linux release 7.4.1708 (Core)
spark3D: 0.1.4
spark-fits: 0.6.0
#72 adds a script to benchmark the partitioning. The idea is the following:
Regardless the partitioning (octree or onion), the GC time is rather big compared to the compute time:
Octree (mapPartitions at Shape3DRDD.scala:164):
Onion (mapPartitions at Shape3DRDD.scala:164)
The code responsible of this is (
Shape3DRDD.scala:142
)We must investigate this.
The text was updated successfully, but these errors were encountered: