Consider an OptimizedPredictionStrategy for survival forest #1419
Labels
performance
Issue relates to the speed, memory usage, or scaling aspects of the package.
requires research
An issue that needs additional thought and experimentation before it can be implemented.
See #1350 for a quick overview. The idea would be to use OptimizedPredictionStrategy if num.failures is below some threshold, say 150.
Another potential alternative could be to modify DefaultPredicitonStrategy's weight container. As pointed out in #483 std::unordered_map have some drawbacks. For predictions these drawbacks are data dependent, for some weight functions, a sparse hash table (https://github.com/erikcs/grf/commits/NewHash/) can be faster. The idea would be to find some optimal threshold to decide when to use the stl hash table and when to use a sparse hash table, then automate that choice in a new container.
edit: Actually, the latest standard library unordered_map appears to have improved, and the DefaultPredictionStrategy is actually not that slow. A high hanging fruit to speed up survival forests on large data with many failure times is an algorithmic improvement in computing the logrank statistics in SurivalSplittingRule: make it O(log num.failures.node) instead of O(num.failures.node) using segment trees.
The text was updated successfully, but these errors were encountered: