Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider an OptimizedPredictionStrategy for survival forest #1419

Open
erikcs opened this issue Jun 10, 2024 · 0 comments
Open

Consider an OptimizedPredictionStrategy for survival forest #1419

erikcs opened this issue Jun 10, 2024 · 0 comments
Labels
performance Issue relates to the speed, memory usage, or scaling aspects of the package. requires research An issue that needs additional thought and experimentation before it can be implemented.

Comments

@erikcs
Copy link
Member

erikcs commented Jun 10, 2024

See #1350 for a quick overview. The idea would be to use OptimizedPredictionStrategy if num.failures is below some threshold, say 150.

Another potential alternative could be to modify DefaultPredicitonStrategy's weight container. As pointed out in #483 std::unordered_map have some drawbacks. For predictions these drawbacks are data dependent, for some weight functions, a sparse hash table (https://github.com/erikcs/grf/commits/NewHash/) can be faster. The idea would be to find some optimal threshold to decide when to use the stl hash table and when to use a sparse hash table, then automate that choice in a new container.

edit: Actually, the latest standard library unordered_map appears to have improved, and the DefaultPredictionStrategy is actually not that slow. A high hanging fruit to speed up survival forests on large data with many failure times is an algorithmic improvement in computing the logrank statistics in SurivalSplittingRule: make it O(log num.failures.node) instead of O(num.failures.node) using segment trees.

@erikcs erikcs added performance Issue relates to the speed, memory usage, or scaling aspects of the package. requires research An issue that needs additional thought and experimentation before it can be implemented. labels Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issue relates to the speed, memory usage, or scaling aspects of the package. requires research An issue that needs additional thought and experimentation before it can be implemented.
Projects
None yet
Development

No branches or pull requests

1 participant