-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyze blunders #5
Comments
From @Tilps on May 8, 2018 9:51 https://lichess.org/e07JvP6g - I started analyzing this - the position after the blunder has a 0.11% policy for a move which is checkmate. Takes 20k visits to get its first look and then it obviously gets every visit. I haven't tested how that varies with noise applied. |
From @Ghotrix on May 8, 2018 10:17 isn't |
Here's an easy one ply discovered attack tactic missed by Leela after 2K5 nodes. Position: https://lichess.org/mbWjiT93#105 Twitch recording of the thinking time/engine output: https://clips.twitch.tv/GenerousSmellyEggnogPunchTrees And as the lichess analysis says, this was "merely" the cherry on top of the multiple-mistakes cake. How to swing 15 points' eval in just 3 moves! Further analysis requested please. How many playouts until Leela even once searches the tactic? Edit: Tilps' position is also a discovery bug, I think Leela's policy assumes that the rook can just capture the queen, which is of course prevented by the pin = discovered attack |
From @hsntgm on May 8, 2018 11:46 @mooskagh thanks for diagram. If i wrong please correct me. Leela's brain gets power from memorized games and positional samples she collected in self play and we call it visits.I see she has visits comes from weights instead of alfa-beta pruning.If there is a tactical opportunity in the position but leela visits an other move much she choose it. In basic tactical positions occurs suddenly in the game and the hardest part is to teach her this. Or can you add a simple tactical search algorithm triggers on every move working independently from visits for a while.After she find tactical move with tactical search algorithm(looks for suddenly jumps to +1 +2 etc) and enter this move tree she can collect this sample to her brain too.With this way she learn playing tactically in short time and tune herself automatically.
|
From @chara1ampos on May 9, 2018 5:49 I am stating the obvious, but I think that brute force engines like Stockfish and Houdini have the advantage that their evaluation is cheap, and they can search very deep, thus having great tactics. Leela's evaluation is very expensive, and thus she cannot search deep enough to avoid blunders. I sense that if one could speed up her evaluation, so she could search deeper, her blunders would be greatly reduced. On an Nvidia Titan V, where Leela cudnn can evaluate 8000 nodes per second, she did not seem to blunder, and even won several games against Stockfish, Komodo and Houdini: I recall that alpha zero evaluated around 100000 nodes per second on the deep mind supercomputer, which greatly improves its tactics. This begs the question: what nps did alpha zero use during its training process? |
From @mooskagh on May 9, 2018 9:3 I've added a form for problematic positions submission into the original message. |
From @Ishinoshita on May 9, 2018 9:55 @chara1ampos : The DM paper says "During training, each MCTS used 800 simulations.", which is a bit ambiguous and may read as new playouts added to the tree or as visits for selected node. Thus nps is irrelevant (but for the total training time). 800 'simulations' is anyway far below 10K's of simulations you mention for match games. So, yes, AZC training may have included blunders as well, at least in early stages (like where we stand now). |
From @Why-Sensei on May 9, 2018 10:17
|
From @hsntgm on May 9, 2018 14:44 @chara1ampos why anybody ask this question maybe Alpha zero just a auto tuned stockfish derivative with neural network.The traditional chess engines elo depends tuning parameters in their code.Maybe they just do that in neural network. Stockfish 1.01 elo 2754 in 2008 Look stockfish development history it gained only 700 elo in ten years with million cpu time and genius c programmers whose tuned parameters step by step.Now we wait Leela gains 500 elo with self play.Who knows maybe the road map is totally wrong. Why i think that because someone says leela draws with stockfish ok very good news but how can you explain these blunders and tactical weakness 3000 elo program? Leela's skeleton formed after 10 million games there is no return and this is big paradox for project. |
From @Ishinoshita on May 9, 2018 16:53 "maybe Alpha zero just a auto tuned stockfish derivative with neural network"
|
From @Why-Sensei on May 10, 2018 10:32
|
From @mooskagh on May 11, 2018 8:18 Thanks for submitted the bug reports, they were very useful. All the blunders so far can be explained by #576. So for a few days (until at ~300000-500000 games are generated by v0.10 client and network is trained on that), don't submit any other positions, as they are likely caused by the same bug. After that new blunder reports are very welcome! |
From @mooskagh on May 13, 2018 16:14 For now it would be the most interesting to see examples of blunders that appeared recently. |
From @TCECfan on May 14, 2018 17:29 ID: ID288CCLSGame65 |
From @TCECfan on May 14, 2018 17:48 ID: ID288CCLSGame53 |
From @mooskagh on May 14, 2018 18:9 Thanks posting, we are looking into those positions. |
From @TCECfan on May 14, 2018 18:10 ID: ID288CCLSGame72 |
From @TCECfan on May 14, 2018 18:13 There I lots more examples but I will stop here then :) |
From @TCECfan on May 14, 2018 18:52 I couldn't resist one more... |
From @apleasantillusion on May 14, 2018 22:11 Interestingly, on the Rc7?? Kxc7 and Ra6 Kxa6?? blunders above, I can reproduce them with ID288 on CPU both with game history. With just FEN, while it doesn't play both blunders, the killing responses to both blunders are given very low probability from policy, so it's just dumb luck that the engine doesn't play the blunder. The really interesting part is that with the FEN modified so the 50-move rule halfmove counter is set to 0, it immediately sees both killing moves with very high policy outputs. This is also true of this recent match game: http://lczero.org/match_game/268131 With game history or FEN, 292 plays 132. Rc7??, giving the obvious capture response very, very low policy output. With FEN altered so 50-move rule halfmove counter is set to 0, it immediately sees the capture with 99% probability from policy. Maybe these examples are just lucky, but it seems high values for the 50-move rule halfmove counter correlate with very strange blunders. |
From @nelsongribeiro on May 14, 2018 23:26 http://lczero.org/match_game/268155 ID 292 blunders again against ID 233 near the 50-move rule coming up... |
From @trophymursky on May 14, 2018 23:29 interesting bit based off of apleasantillusion's comment (tho I'm using 292). the fen for the interesting position is "2r5/R7/8/8/5k2/8/2K5/8 w - - 85 121" where the policy net ID292 has Rc7 (wrongfully) at 99.91%. specifically if you set it to 60 half moves (instead of 85) the policy net fro Rc7 is at .07%. At 65 half moves it's at .2%, at 66 it's at .71%, 67 it's at 1.23%, 68 it's at 6.53% (no longer considered the worst move), 69 it's at 89.47 percent. I have no idea why the inflection point would be anywhere near where it, but it's definitely interesting and points towards a training bug corrupting the policy net. |
From @so-much-meta on May 15, 2018 6:40 FYI... Regarding the a7c7 rook blunder above, I think this might be explained (partially) by glinscott/leela-chess#607 Network 288.. With history: Without history: |
From @so-much-meta on May 15, 2018 7:35 As to the a7c7 blunder above, I think the history's only part of the problem... The other part of the issue is that the All Ones plane (last input plane) bug really messed up policies. Good input data was being trained on a bad policy. Consider the effect of the negative log loss/cross entropy in these examples (non-buggy network with low outputs getting trained on a buggy high output). Here's output from network ID 280. Notice that the a7c7 move only has high probability when the all ones input plane was buggy. Essentially, I think it was bad data like this that kept messing things up. History + AllOnesBug History + NoBug NoHistory + AllOnesBug NoHistory+NoBug Now look how all of that changed by network 286, below - now the input with missing history is starting to show the bad policy: History+AllOnesBug History+NoBug NoHistory+AllOnesBug NoHistory+NoBug By the time it got to network 288, the policy was really bad in this particular spot: History+NoBug NoHistory+AllOnesBug NoHistory+NoBug Now, at network 294, this is the current situation (ignoring buggy input plane, as it's no longer relevant): NoHistory+NoBug |
From @gyathaar on May 15, 2018 12:26 Does it still blunder in those positions if you use --fpu_reduction=0.01 (instead of default 0.1) ? |
From @apleasantillusion on May 15, 2018 14:54 In the game nelsongribeiro posted, the same pattern holds true (tested with 292). With history, she plays 124.Ke7 with a very high probability from policy (84.89%), and the response Qxd5 just taking the hanging queen is given only a 2.93% from policy. Without history at the root, just FEN, she again plays Ke7 with high probability from policy (95.83%), and the Qxd5 response taking the hanging queen is given only 2.33% from policy. With the FEN modified in only one way, setting 50-move rule counter to 0, Ke7's policy drops to 37.34%, and Qxd5 after Ke7 jumps to 95.07% Now, from a purely objective standpoint in this particular position, none of this matters so much, since the position is losing to begin with, although forcing black to find the winning idea in the king and pawn ending is a much stronger way of playing than just hanging the queen. Also, independently of that, the fact that taking a hanging queen is only ~2% from policy when the 50-move rule counter is high is a bit disturbing and is in line with the other examples I cited above. In general, the variation in probability for Qxd5 based on the 50-move rule counter is quite odd. In that exact position with black to move (6q1/4K2k/6p1/3Q1p1p/7P/6P1/8/8 b - - 0 0), here are probabilities for Qxd5 with different values of 50-move rule counter: 0: 68.26% |
From @nelsongribeiro on May 15, 2018 15:46 The really bad move is the move made just before that position: Its a draw at this point, on move 123, ID 292 played 123. QD5 intead of 123. QE6 Last time that a pawn was moved was at 75...f5 , what makes this the 48th move after that. EDIT: best move was wrong before.. |
From @apleasantillusion on May 15, 2018 20:49 Just to add to this, the Rxf3+ that's being tracked in the sheet at https://docs.google.com/spreadsheets/d/1884-iHTzR73AgFm19YYymg2yKnwhtHdaGyHLVnffLag/edit#gid=0 shows the same behavior. With net 297, probability with various 50-move counter rule values: 0: 0.07% That's some heavy variation just from changing the 50-move rule counter. Also, the pattern is different with this one. In all the others, probabilities were worst at the very high counts, a bit better at very low counts, and best at counts around 30. Here that last trend maintains, but the other is more muddled. |
From @ASilver on May 17, 2018 5:7 I don't know if it is a consequence of the bug, or the new PUCT values which inhibit tactics, but the latest versions (I am watching 303 right now) have some appallingly weak ideas of king safety and closed positions. I am playing a match against id223 at 1m+1s and it is more than a tactical weakness issue, it is one of completely wrong evaluations, which 223 did not have, that is leading it to happily let its king be attacked until it is too late to save itself. I also saw more than one case where it thought a dead drawn blocked game, with no entry or pieces, was +2, while 223 thought it about equal. The result was that 303 preferred to sacrifice a pawn or two to not allow a draw, and then lost quickly thanks to the material it gave away. id303 is white, and id223 is black. Both are playing with v10 (different folders) with default settings. |
From @TCECfan on May 17, 2018 6:29 @ASilver I have watched many games from many of the CCLS gauntlets, and my overall view (from the perspective of spectator) is that her style changed markedly following the release of v0.8. In particular, she started showing: |
From @so-much-meta on May 17, 2018 7:38 Regarding the rook blunder above: I did some more analysis. I think some here might find it useful. For one, I didn't look closely at rule50 when I looked at this before. Also, I found evidence that filling in history panes with fake data (just copy current position, or oldest if some history is available) is probably best in case no history is available. Check out the attached PDFs with graphs. What I did was I iterated the halfmove clock (rule50) from 0 through 91, created the FEN position at each halfmove clock, played each of the 8 moves, and determined the network value and policy under the following conditions: I did this with both net 288 and net 303. So that's 12 graphs. (halfmove clock is horizontal axis). I also added the move a7c7 (the blunder move - where Leela does not want to take the free rook, so blunders again in the PV), and did the exact same process. Some findings:
rook_blunder_Id288_Id303.pdf |
From @mooskagh on May 17, 2018 8:20 The value graphs for network 303 are a bit misleading. Would be more demonstrative if all graphs were with 0.0..1.0 scale. Also, what is the correct move there? (there were several rook blunders in this thread, so it's not immediately clear which one you refer to). |
From @peterjacobi on May 17, 2018 9:1 Good examples for the longstanding problem with discovered attacks. Furunkel just posted in #game-analysis, but should be preserved here:
Also attached: leela300-discoveries.txt |
From @rwbc on May 17, 2018 9:2 Interesting blind spot (with mate) again in current ID 304 (from matchplay): |
From @hsntgm on May 17, 2018 12:14 Latest ID's have higher elo so i tested ID 303 with 60 min game with stockfish yesterday i watched a whole game. I was hoping a good result but it seems she still suffer this unknown bug in deep. It seems it will be very difficult to get rid of the past bug effect.She has broken heart.If you believe that you found the bug in algorithm and corrected it maybe you just have to start whole training process again.If the problem is training data and you corrected it i think you have to re-start again because she will have to do two times more training than is necessary to get rid of this effect. At least simulate it from the beginning with all the training data (from the best clean one we know is ID253) and check the situation.Still something goes wrong and please listen to people. |
ok, there is still a bug in Leela and I think it is clearly linked to the 50-move rule somehow. I was checking the PGN of my CLOP run using NN357, and TC of 48s+0.1s. I found a loss in 180 moves which is strange since I set it for adjudication by tablebase (5-piece). I then look at it and see this. First move 133, the final pawn capture, 133. Kxf3:
I see no explanation other than the 50-move rule being a factor here. I am attaching the PGN. |
Fix net.py to match current proto.
From @mooskagh on May 8, 2018 7:23
Important!
When reporting positions to analyze, please use the following form. It makes it easier to see what's problematic with the position:
lc0
/lczero
version, operating system, and non-default parameters (number of threads, batch size, fpu reduction, etc).(old text below)
There are many reports on forums asking about blunders, and the answers so far had been something along the lines "it's fine, it will learn eventually, we don't know exactly why it happens".
I think at this point it makes sense to actually look into them to confirm that there no some blind spots in training. For that we need to:
--temperature=1.0 --noise
)" to see how training data would look like for this position.Eventually all of this would be nice to have as a single command, but we can start manually.
For
lc0
, that can be done this way:--verbose-move-stats -t 1 --minibatch-size=1 --no-smart-pruning
(unless you want to debug specifically with other settings).Then run UCI interface, do command:
(PGN move to UCI notation can be converted using
pgn-extract -Wuci
)Then do:
see results, add some more nodes by running:
And look how counters change.
Counters:
Help wanted:
Copied from original issue: glinscott/leela-chess#558
The text was updated successfully, but these errors were encountered: