Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: algorithm for oneOfWeighted has better distribution #47

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ctholho
Copy link
Contributor

@ctholho ctholho commented Jan 8, 2025

fixes #46

@justinvdm justinvdm changed the title BREAKING CHANGE: fix: algorithm for oneOfWeighted has better distribution BREAKING CHANGE: fix!: algorithm for oneOfWeighted has better distribution Jan 8, 2025
@justinvdm justinvdm changed the title BREAKING CHANGE: fix!: algorithm for oneOfWeighted has better distribution fix!: algorithm for oneOfWeighted has better distribution Jan 8, 2025
oneOfWeighted.js Outdated
if (flip(id, p)) {
return resolve(id, sample[1])
var id = hash(input, 'oneOfWeighted')
const prob = (id % 1_000_000) / 1_000_000
Copy link
Contributor

@justinvdm justinvdm Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash() gives us back a 52-bit int, so I think we might be truncating the range with the / 1_000_000, which could affect the quality of the results. Could we do id / Number.MAX_SAFE_INTEGER here instead?

edit Turns out I was wrong here, reply below this one has more details.

Suggested change
const prob = (id % 1_000_000) / 1_000_000
var prob = id / (2 ** 52)

Copy link
Contributor

@justinvdm justinvdm Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out my suggestion fails the tests (the diff threshold is greater than 6.5%), whereas yours doesn't. I can't understand why. So lets go for your change - scratch my suggestion :) Can you help me understand why 1M works better?

Only suggested change then is to use es5 (context here).

Suggested change
const prob = (id % 1_000_000) / 1_000_000
var prob = (id % 1000000) / 1000000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried today again many different settings and don't understand it at all. I figured in the end it doesn't really matter because only the probabilities the user defined in the array are important. It's unlikely they would define values like 0.000004 and want to run the function more than a million times. Right now, somehow 1 million gave the best results with many different things I tried.

On the other hand: If we break backwards compatibility we should do it right the first time. So I'll take some more time to look into alternatives.

oneOfWeighted.js Outdated
Comment on lines 17 to 18
let cumulative = 0;
for (const [probability, value] of samples) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I created this project 5 years ago, I had the brainwave of writing it only in es5 to avoid a build step. I'm not sure if that decision was the right one, but unfortunately it means for now we need to write the code as es5 for linting to pass. I'm sorry about this, I realise it is absurd to ask someone to write es5 in 2025 :)

Suggested change
let cumulative = 0;
for (const [probability, value] of samples) {
var cumulative = 0;
for (var i = 0; i < samples.length; i++) {
var probability = samples[i][0];
var value = samples[i][1];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright. happy to comply.

oneOfWeighted.js Outdated Show resolved Hide resolved
@@ -29,6 +30,128 @@ test(`averages to within ${
t.assert(diffBetween(sums.blue / n, 0.3) <= DIFF_THRESHOLD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.assert(diffBetween(sums.blue / n, 0.3) <= DIFF_THRESHOLD)
t.assert(diffBetween(sums.green / n, 0.2) <= DIFF_THRESHOLD)
t.assert(diffBetween(sums.blue / n, 0.1) <= DIFF_THRESHOLD)
// maybe yellow and fuchsia here too?

@justinvdm
Copy link
Contributor

Looks great :) Just a few minor comments.

@justinvdm justinvdm self-requested a review January 8, 2025 19:28
@justinvdm
Copy link
Contributor

@ctholho I should really add contribution guidelines and CI for this project. For now, can you do these to run checks for your changes and update the docs and tests?

# checks
yarn lint
yarn test

# update snapshot tests
yarn test -u

# update docs (when code examples yield different results, e.g. for breaking changes)
yarn build:docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

oneOfWeighted incorrect distribution
2 participants