-
-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stack overflow when depth is too deep #780
Conversation
It seems that some platforms uses unsigned long instead of unsigned long long for size_t. I need to fix the printf issue. |
Thanks a lot, this would be great to have fixed. As far as I can see you applied the switch from stack to heap for both ALLVARS and normal version in the same way. In principle I thought we might only do it where required (i.e., when all=yes), although if it does not perform any worse as you indicated we might as well keep it simple and safe and always use heap. Does your performance comparison apply for both (all=yes/no)? I will later also try to test myself. |
I've only tried on all=yes currently. I'll test on all=no. |
all=no largeboard=yes Basically on par with each other in this case. |
all=no largeboard=no Basically on par with each other in this case. |
My results show that heap beats stack by 10%~15% in NPS when all=yes largeboards=yes. |
Thanks, got similar results that performance is comparable with all=no and better with all=yes, so looks good. I am just not sure about the 1024 ply limit, since it likely does not have any advantage. If a position gets solved quickly where it actually reached max depth, then higher nominal depth won't provide any new information. And for other positions you likely won't ever reach depth 246. So I would suggest to revert that back to the original value. |
1024 is just set for some enthusiasts who use a very high end CPU (e.g. Threadripper 7995WX) to go much deeper in solving one position. Since using heap solves the depth problem, and it doesn't seem to do any bad to performance and stability when setting max depth to 1024 in common cases, I think 1024 can meet the needs of enthusiasts without compromising the experience of common users. |
One example is someone said in fairy-stockfish discord that a position is a mate-in-500. Setting MAX_PLY to 1024 while removing the 50 moves rule can enable FSF to solve this position. If nothing in common cases is compromised, adding support to rare cases would make it better. |
Setting a very high max ply does have a downside in the fairly common case of a short forced mate. There the engine will quickly solve the position at low ply and then rush through all iterations until max ply without needing to do much more search. Since the new iterations do not really require new search (as TT has everything already), it won't consume that much CPU by the engine, but it will produce a lot more (fairly useless) output that needs to be communicated and parsed, which can e.g. put significant load on a GUI. Since this is the much more common scenario, in the rare case that someone is willing to put enourmous efforts into solving a long win, recompiling with a higher max ply shouldn't be a big deal. |
I've done some tests on different GUIs, and the result shows that on C++ based GUIs like Cute Chess and Chessbase and JavaScript based GUIs like LiGround and Fairyground, the difference on time between 246 and 1024 is less than 100 milliseconds on average. However on Python based GUIs like PyChess GUI and FairyFishGUI, the difference can be at most 3 seconds, probably due to the low performance issue of Python. |
#583 .
After this change, added a macro definition that can switch the preference whether to put the move list in stack or in heap. Now given the following commands:
It will work correctly to depth 245.
The build is successful without warnings on my PC, running MSYS2 MinGW64.
I've also did some bench test on the 2 methods on the same computer with the same environment. Interestingly, the heap method is slightly faster than the stack method. The test command is:
Result: