-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combined loss implementation #20
Comments
Hi @AmenRa, I'm also interested in this repo, if you let me join the discussion, regarding your first question, I agree with you, it seems alpha has been implicitly set to one (not sure why). for the second question, I believe the overall pair-wise loss is computed for every batch, here in line 141 the positive scores are repeated to match the in-batch negatives (to generate all possible pairs) and then we compute ranknet loss line 171 Hope this helps. |
@mohammed-elkomy Thanks for joining the discussion :). Your understandings are exactly correct. For the first question, we found this simple strategy, setting alpha to one, performed impressively well. We did not do a thorough study about the influence of different alpha values. So maybe it is better to set it to some other value. For the second question, we consider all in-batch samples as approximately random negatives. The positive scores are repeated to compute the pairwise loss. Thank @AmenRa @mohammed-elkomy for the interest in our work! Best, |
@mohammed-elkomy and @jingtaozhan, thank you both for replying! I have further questions. What are Assuming I'm correct:
Thanks again. |
Hi @AmenRa, I'm not sure what you mean by batch size, but I think you mean the number of queries or positive documents per training step [dataset 216 dataset 217, for each training step we sample
on the other hand and since Regards, (hope I'm correct 🤣🤣) |
Very sorry for the late reply. I had two busy weeks and forgot to reply @AmenRa @mohammed-elkomy Thank @mohammed-elkomy for detailed explanation. Your words are very clear and exactly correct. This is exactly how the loss is computed :). |
Thanks @mohammed-elkomy for the explanation. I probably swapped the random-negatives related code with the hard-negative one while reasoning about the implementation. Following your explanation, I assume From my understanding Hope it's clear what I mean :D |
Not exactly correct. Part of the second loss is indeed the hard negative loss, while the other part is approximately random negative loss because qi's hard negative is qj(j!=i)‘s random negative. Note, line 165 uses
Yes, it is a global average of all random and hard negative pairs. |
Hi, I am trying to understand how you combined the
hard negative loss
Ls with thein-batch random negative loss
Lr, as in the paper thein-batch random negative loss
is scaled by analpha
hyperparameter but there is no mention of the value ofalpha
you used in the experiments.Following
star/train.py
I found theRobertaDot_InBatch
model, whoseforward
function calls theinbatch_train
method.A the end of the
inbatch_train
method (line 182), I foundwhich is different from the combined loss proposed in the paper (Eq. 13).
Am I missing something?
Also, for each query in the batch, did you consider all the possible
in-batch random negatives
or just one?Thanks in advance!
The text was updated successfully, but these errors were encountered: