Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Again about the benchmark #5157

Closed
amchess opened this issue Apr 6, 2024 · 28 comments
Closed

Again about the benchmark #5157

amchess opened this issue Apr 6, 2024 · 28 comments

Comments

@amchess
Copy link

amchess commented Apr 6, 2024

Describe the issue

No: imho, the problem isn't the network size.
I think the refactoring with the worker isn't no funcitonal changes.
For example, in position.cpp there are the following lines:

    // Update material hash key and prefetch access to materialTable
    k ^= Zobrist::psq[captured][capsq];
    st->materialKey ^= Zobrist::psq[captured][pieceCount[captured]];

but the materialTable has been deleted and there is no prefetch.

Expected behavior

not this behaviour

Steps to reproduce

./Stockfish16 bench
./Stockfish16.1 bench

Anything else?

No response

Operating system

All

Stockfish version

S

@PGG106
Copy link

PGG106 commented Apr 6, 2024

Yes the comment is outdated but a table not being used anymore and a useless prefetch not being present doesn't make the engine slower, the change was non functional.
The newer versions are slower because the net arch changed, the layers are bigger and the inference has to do more work, hence it's slower (multiplications aren't free), this issue is just a duplicate of the previous one.

@Disservin
Copy link
Member

SF 16:      
 L1:  1536, nn-5af11540bbfe.nnue
 nps: 1823913

SF 16.1:
 L1:  2560, nn-b1a57edbea57.nnue
 nps: 1210743 

(with smaller network)
SF 16.1:
 L1: 1536, nn-5af11540bbfe.nnue
 nps: 1472458

master: 1096393

SF 16, still used a mixed of classical and main net, where the classical part is probably giving the other additional speedup.

SF 16 bench was a mix of positions with NNUE and classical, I think 50/50 split.
SF 16.1 is 100% NNUE in bench.

SF 16.1 uses the main net and the small net, with classical completely removed...

@Disservin Disservin closed this as not planned Won't fix, can't repro, duplicate, stale Apr 6, 2024
@mstembera
Copy link
Contributor

mstembera commented Apr 6, 2024

We should probably simplify materialKey away though.
[Edit] Never mind. Looks like we still use it for syzygy.

@amchess
Copy link
Author

amchess commented Apr 7, 2024

Before removing classical eval:Simplify PvNode reduction Jul 11,2023
6a8767a
Nodes/second : 1502286

With removing classical eval: Jul 11, 2023
https://github.com/official-stockfish/Stockfish/commits/master/?before=0716b845fdef8a20102b07eaec074b8da8162523+315
Nodes/second : 885742

Same net

I don't understand why.

@dav1312
Copy link
Contributor

dav1312 commented Apr 7, 2024

As Disservin already said, it is because of 50% of the bench positions used only the classical evaluation, which was much faster
To compare only the "real" Stockfish, use bench 16 1 13 default depth NNUE for commits before the removal of classical

@amchess
Copy link
Author

amchess commented Apr 7, 2024

Done
bench 16 1 13 default depth NNUE
before
Nodes/second : 924580
after
Nodes/second : 769656
I'm very sorry, but yet I don't understand.

@peregrineshahin
Copy link
Contributor

I will answer you, though I'm already regretting it (interaction with you is sickening).
Classical Eval role, was mianly about delevering not so good eval in compansation of speed, so even with using NNUE as default you still use Classical Eval in winning positions.

@dav1312
Copy link
Contributor

dav1312 commented Apr 7, 2024

Stockfish/src/evaluate.cpp

Lines 1056 to 1059 in 6a8767a

// We use the much less accurate but faster Classical eval when the NNUE
// option is set to false. Otherwise we use the NNUE eval unless the
// PSQ advantage is decisive. (~4 Elo at STC, 1 Elo at LTC)
bool useClassical = !useNNUE || abs(psq) > 2048;

@Disservin
Copy link
Member

command: ./stockfish bench 16 1 13 default depth NNUE;
commit : 6a8767a0d5d9502e6d4de1bef97468b5d6fab80a (Simplify PvNode reduction)
changes: with classical part removed in evaluate
nps    : 1111266

command: ./stockfish bench 16 1 13 default depth NNUE;
commit : 6a8767a0d5d9502e6d4de1bef97468b5d6fab80a (Simplify PvNode reduction)
nps    : 1180150

command: ./stockfish bench 16 1 13 default depth NNUE;
commit : af110e02ec96cdb46cf84c68252a1da15a902395 (Remove classical evaluation)
nps    : 1112121

@amchess
Copy link
Author

amchess commented Apr 7, 2024

OK: so, for two elo points, do you degrade performance so fearfully?

@Disservin
Copy link
Member

What

@amchess
Copy link
Author

amchess commented Apr 7, 2024

image

@XInTheDark
Copy link
Contributor

how is this relevant to your problem about the benchmark nps?

@PGG106
Copy link

PGG106 commented Apr 7, 2024

hce wasn't removed to gain 2 Elo, it was removed even if removing it lost 2 Elo, because with long enough tc the impact was 0.
A 0 impact on performance is not performance degradation, the fact the engine reports a slower bench speed because hce isn't there to inflate the metric doesn't mean anything.

@PGG106
Copy link

PGG106 commented Apr 7, 2024

In general arguing about hce removal now is useless, it's a done deal, new and better solutions now fill the void the hce removal left, devs are working on sf 17 and stuff that was definitive in sf 16 is definitely not relevant now.

@amchess
Copy link
Author

amchess commented Apr 7, 2024

For me, it is crazy and I don't agree at all, but clearly do as you wish.

@PGG106
Copy link

PGG106 commented Apr 7, 2024

you don't understand anything you talk about, that's all there is to it, as usual.
Ofc switching over to hce inflates the speed, you are avoiding the nnue slowdown by not using the nnue at all, switching back to fully hce eval would make the engine faster too, but so what?
speed is just a metric, you can sacrifice speed to gain elo, it's what the hce to nnue switch did, moreover removing hce isn't that much of a speed drop, bench shouldn't have been mixed in the first place and the metrics it reports are inflated by the positions that are > 2048 cp in value.
Moreover part 2 current sf dev has a solution to speed up the engine in won positions, it's a smaller net that takes the place of what was hce before (without the need to keep all that hce code around).
For once you should try to understand what you are talking about before second guessing tens of devs that all know far more than you.

@PGG106

This comment was marked as off-topic.

@Disservin
Copy link
Member

Disservin commented Apr 7, 2024

For me, it is crazy and I don't agree at all, but clearly do as you wish.

I don't even get your point... on my benchmarks #5157 (comment),
it shows that the actual true bench speed of stockfish was 1180150 and after removing HCE it was 1112121.
That difference is marginal... for some reason people hate it when sf has less nps because it is not using hce anymore, but it is fine for you to have less speed because of a marginally better larger network and some even use this network without any clear evidence that it is stronger even before Stockfish master has it! (looking at you talkchess)

HCE was removed with some loss at STC, the higher marginal higher NPS which Stockfish had before were countervailed with the stronger evaluation.

@amchess
Copy link
Author

amchess commented Apr 7, 2024

I develop a fork of stockfish with the goal of making it understandable/useful to the OTB player, not necessarily, therefore, stronger, and I wondered why this frightening degradation of performance. That's all.

@PGG106
Copy link

PGG106 commented Apr 7, 2024

how is a 2 Elo impact at bullet time controls a "frightening degradation of performance"?, why are we acting like the basically 0 Elo impact (at the very very very long time controls you otb players love) wasn't made even less impactful by the fact smallnet exists? the truth of the matter is that sf is now stronger than when hce was still present and it's stronger than it would be if hce was still present.
You are wondering wrong things based off of wrong assumptions and either outdated or wrong "knowledge" (or lack thereof).

@Disservin
Copy link
Member

I develop a fork of stockfish with the goal of making it understandable/useful to the OTB player, not necessarily, therefore, stronger, and I wondered why this frightening degradation of performance. That's all.

NPS doesn't mean anything, if you want high nps go back to pure alpha beta but lose accuracy of the evaluation. Your speed comparison that you posted earlier were wrong since it used a mixed bench, which is not even what stockfish did during regular play. In my two tests, the difference was like 5%, which is not "frightening degradation of performance" imo.
What we aim for (and everyone else in the community who wrote their own engine which competes at a high level) is elo gain. Over the time we have merged many new networks which made STC a bit weaker (reduced nps) but overall scaled rather well to high time controls... which is also what you are after...

@WampServer

This comment was marked as spam.

@PGG106
Copy link

PGG106 commented Apr 7, 2024

It doesn't matter what anyone thinks, this is about numbers, nps, elo and time controls. This is equivalent to math, it doesn't matter if i think that 3 + 2 is 7 or if my dream is for 12 * 10 to be 97, this is factual information that can only be correctly or incorrectly interpreted, there's no room for opinions here.
That being said it's pretty evident that you are a troll so there's no use engaging with you and i won't do so any further.

@amchess
Copy link
Author

amchess commented Apr 7, 2024

Commits on Jan 14, 2024
Remove the dependency on a Worker from evaluate

before refactor
Nodes/second : 826439

after refactor
Nodes/second : 822816

Same net and bench
The refactoring isn't no functional changes.
Just about facts, numbers...

@PGG106
Copy link

PGG106 commented Apr 7, 2024

that's a supposedly 0.4% speed drop, it's too small of a drop to be measured with a single bench run or even 3 or 5, it's easily explained away by noise and error bars in the bench measurement.
If you want an accurate way to measure speed use the script that is customarily used to measure for speed gains/losses. you'll notice that with enough runs the change is just neutral.
The refactoring is definitely non functional

@Disservin
Copy link
Member

Commits on Jan 14, 2024 Remove the dependency on a Worker from evaluate

before refactor Nodes/second : 826439

after refactor Nodes/second : 822816

Same net and bench The refactoring isn't no functional changes. Just about facts, numbers...

A single bench run isn't very reliable. You could probably redo a single bench at some point and get reversed results. You should be using one of the many speedup scripts to better determine the effect of a patch on the overall performance.

@PGG106
Copy link

PGG106 commented Apr 7, 2024

Here you can find all the info on how to properly measure the speed of stockfish: https://github.com/official-stockfish/Stockfish/wiki/Advanced-topics#measure-the-speed-of-stockfish, coincidentally it also says A speedup of 0.3% could be meaningless (i.e. within the measurement noise) which is exactly what you are measuring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants