You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to build an algorithm that can quickly choose the next statement for someone to rate. There's a few things this algorithm could prioritize so we should consider which ones we care about most. There are also a few different levels of algorithm which imply dramatically different computational loads, so that should also be a consideration.
Possible optimization considerations:
maximize knowledge about each respondent
maximize knowledge about each statement
maximize knowledge about types of respondent
maximize knowledge about types of statement
maximize coverage over all statements
maximize predictive accuracy on new ratings (i.e., collect data about the statements we believe we have the least predictive accuracy on)
maximize predictive accuracy over statements worth predicting (i.e., statements that are gibberish should be algorithmically avoided).
And some of the levels of algorithm might be:
random or block random
optimizing coverage over the statements (i.e., statements are randomly chosen with a weighting that is inverse to the number of times they have already been rated)
optimizing variance within or across people or statements (this can be extended by blocking on statement or person data and other things of course)
training a model and using it or its behavior to inform task selection
using something like a Bayesian optimization approach to choose the next statement (this is probably the most expensive)
The text was updated successfully, but these errors were encountered:
I think we should have a simple way to switch which approach we use and probably implement at lease a purely random, a weighted random, and a simple model based version.
@amirrr let me know if you have thoughts on any of this.
Also @JamesPHoughton please chime in if you have thoughts you think would help us or any resources you might recommend us considering.
Currently using reverse weighted reservoir sampling with MySQL seems like a way to quickly get a statement
WITH weighted_questions AS (
SELECTstatements.id,
statements.`statement`,
1.0/ (COUNT(answers.statementId)+1) AS weight
FROM
statements
LEFT JOIN
answers ONstatements.id=answers.statementIdGROUP BYstatements.id
)
SELECT
id,
`statement`,
-LOG(RAND()) / weight AS priority
FROM
weighted_questions
ORDER BY priority ASCLIMIT1;
We want to build an algorithm that can quickly choose the next statement for someone to rate. There's a few things this algorithm could prioritize so we should consider which ones we care about most. There are also a few different levels of algorithm which imply dramatically different computational loads, so that should also be a consideration.
Possible optimization considerations:
And some of the levels of algorithm might be:
The text was updated successfully, but these errors were encountered: