Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stack overflow when depth is too deep #780

Merged
merged 6 commits into from
Apr 29, 2024

Conversation

yjf2002ghty
Copy link
Contributor

#583 .

After this change, added a macro definition that can switch the preference whether to put the move list in stack or in heap. Now given the following commands:

uci
setoption name UCI_Variant value gustav3
position fen arnbqkbnra/*ppp1pppp*/*3p4*/*8*/*8*/*N1P5*/*PP1PPPPP*/AR1BQKBNRA b KQkq - 0 2 moves f7f5 f2f3 d8e7 e2e4 i7i5 i2i4 j8g5 c2c3 c8d6 a1c2 h8g6 c2d2 f5f4 h1f2 e6e5 j1g4 b7b6 g4g5 e7g5 f2g4 a8b7 d2f2 d6f7 d3d4 d7d6 b3d2 b7d8 d1b3 c7c5 e1e2 g5h4 g2g3 h4e7 f2h3 f7g5 h3h5 g8b3 d2b3 d8f7 b3d2 b6b5 b1d1 g6h8 h5f7 h8f7 f1g2 c5d4 c3d4 b8c8 g1f2 f7h6 d4e5 h6g4 f3g4 d6e5 d2f3 f8h8 f3g5 e7g5 g3f4 g5f4 d1d6 h7h6 d6d5 c8c4 b2b3 c4c3 d5d3 c3d3 e2d3 e8c8 g2h3 g8d8 d3b5 c8c2 f2g3 d8d3 i1g1 d3b3 b5f1 c2e4 f1d1 b3d3 d1b1 h8i7 g1d1 f4d2 b1b2 e4e2 b2b7 i7h7 d1g1 e5e4 g1g2 e2f3 b7b5 d2g5 b5f5 f3f5 g4f5 e4e3 g2b2 d3d4 b2b6 d4e4 b6e6 e4e6 f5e6 h7g6 h3g4 g6f6 g4f3 f6e6 g3c7 g7g6 h2h3 h6h5 c7g3 h5i4 h3i4 g5h6 f3e4 e6f6 g3e5 f6f7 e5g3 f7f6 g3e5 f6f7 e5g3 f7e6 g3e1 e6d6 e1b4 d6e6 b4e1 e6f6 e1g3 e3e2 e4f3 f6g5 g3f2 h6g7 f2g3 g7h6 g3f2 g5f5 f2g3 h6f8 g3f2 f8g7 f2e1 g6g5 e1g3 g7f8 g3e1 g5g4 f3e2 f5f4 e2f2 f8e7 e1d2 f4f5 d2h6 e7h4 f2g2 f5g6 h6e3 g4g3 e3d2 g6f5 g2f3 g3g2 f3g2 f5g4 d2h6 h4e1 h6i5 e1i5 g2f1 i5h4 i4i5 h4i5 f1g2 i5e1 g2f1 e1d2 f1g2 d2c1 g2f1 c1b2 f1e1 b2a1 e1d1 a1b2 d1e2 b2a1 e2d1 a1b2 d1e2 g4g5 e2f1 b2a1 f1e1 a1b2 e1e2 b2a1 e2d1 a1b2 d1e2 b2a1 e2d1 g5h5 d1c1 a1b2 c1b1 b2a1 b1a1 h5g4 a1b1 g4f3 b1a1 f3e2 a1b1 e2d1 b1a1
go

It will work correctly to depth 245.

The build is successful without warnings on my PC, running MSYS2 MinGW64.

I've also did some bench test on the 2 methods on the same computer with the same environment. Interestingly, the heap method is slightly faster than the stack method. The test command is:

uci
bench
bench
bench
bench
bench

Result:
image

@yjf2002ghty
Copy link
Contributor Author

It seems that some platforms uses unsigned long instead of unsigned long long for size_t. I need to fix the printf issue.

@yjf2002ghty
Copy link
Contributor Author

yjf2002ghty commented Apr 23, 2024

My results does not ensure the performance advantage on all platforms

By the way, I've tested the position when MAX_PLY = 999;, and the result is good:
image

@ianfab
Copy link
Member

ianfab commented Apr 24, 2024

Thanks a lot, this would be great to have fixed. As far as I can see you applied the switch from stack to heap for both ALLVARS and normal version in the same way. In principle I thought we might only do it where required (i.e., when all=yes), although if it does not perform any worse as you indicated we might as well keep it simple and safe and always use heap. Does your performance comparison apply for both (all=yes/no)? I will later also try to test myself.

@yjf2002ghty
Copy link
Contributor Author

I've only tried on all=yes currently. I'll test on all=no.

@yjf2002ghty
Copy link
Contributor Author

yjf2002ghty commented Apr 24, 2024

all=no  largeboard=yes
Using default bench for 47 positions with all benches searching 6257668 nodes:
HEAP:
Time: 11919 + 11835 + 12082 + 12235 + 11858   Average: 11985.8
STACK:
Time: 12072 + 11930 + 11930 + 11960 + 12124   Average: 12003.2

Basically on par with each other in this case.

@yjf2002ghty
Copy link
Contributor Author

all=no  largeboard=no
Using default bench for 47 positions with all benches searching 6257668 nodes:
HEAP:
Time: 8863 + 8682 + 8630 + 8653 + 8687   Average: 8703
STACK:
Time: 8710 + 8805 + 8784 + 9187 + 8744   Average: 8846

Basically on par with each other in this case.

@yjf2002ghty
Copy link
Contributor Author

yjf2002ghty commented Apr 24, 2024

My results show that heap beats stack by 10%~15% in NPS when all=yes largeboards=yes.
Other versions they basically have no difference.

@ianfab
Copy link
Member

ianfab commented Apr 28, 2024

Thanks, got similar results that performance is comparable with all=no and better with all=yes, so looks good.

I am just not sure about the 1024 ply limit, since it likely does not have any advantage. If a position gets solved quickly where it actually reached max depth, then higher nominal depth won't provide any new information. And for other positions you likely won't ever reach depth 246. So I would suggest to revert that back to the original value.

@yjf2002ghty
Copy link
Contributor Author

1024 is just set for some enthusiasts who use a very high end CPU (e.g. Threadripper 7995WX) to go much deeper in solving one position. Since using heap solves the depth problem, and it doesn't seem to do any bad to performance and stability when setting max depth to 1024 in common cases, I think 1024 can meet the needs of enthusiasts without compromising the experience of common users.

@yjf2002ghty
Copy link
Contributor Author

One example is someone said in fairy-stockfish discord that a position is a mate-in-500. Setting MAX_PLY to 1024 while removing the 50 moves rule can enable FSF to solve this position. If nothing in common cases is compromised, adding support to rare cases would make it better.

@ianfab
Copy link
Member

ianfab commented Apr 29, 2024

Setting a very high max ply does have a downside in the fairly common case of a short forced mate. There the engine will quickly solve the position at low ply and then rush through all iterations until max ply without needing to do much more search. Since the new iterations do not really require new search (as TT has everything already), it won't consume that much CPU by the engine, but it will produce a lot more (fairly useless) output that needs to be communicated and parsed, which can e.g. put significant load on a GUI. Since this is the much more common scenario, in the rare case that someone is willing to put enourmous efforts into solving a long win, recompiling with a higher max ply shouldn't be a big deal.

@yjf2002ghty
Copy link
Contributor Author

yjf2002ghty commented Apr 29, 2024

I've done some tests on different GUIs, and the result shows that on C++ based GUIs like Cute Chess and Chessbase and JavaScript based GUIs like LiGround and Fairyground, the difference on time between 246 and 1024 is less than 100 milliseconds on average. However on Python based GUIs like PyChess GUI and FairyFishGUI, the difference can be at most 3 seconds, probably due to the low performance issue of Python.
So it does compromise the experience of Python based GUIs , and limiting depth is necessary.

@ianfab ianfab merged commit 50adcff into fairy-stockfish:master Apr 29, 2024
16 checks passed
@yjf2002ghty yjf2002ghty deleted the patch-1 branch April 29, 2024 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants