You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is something that came up during the experiment.
In general, for task 1, we wanted listeners to rate the overall sound quality of the test sound. That is, even if the separated singing voice (which was the target source for all of the separation algorithms) sounded identical to the reference singing voice, one should also consider the quality of the other instruments in addition to general additive distortions. In this respect, the original mixture should be rated of the highest quality because there are no distortions due to source separation processing.
Two listeners reported that they specifically focused on the singing-voice (vocals) and ignored everything else. They were, however, kind enough to repeat the experiment after clarification.
There are a few ways to overcome this (for next time!):
Emphasise with more sound examples at the familiarisation stage, e.g. reference vocals + highly distorted accompaniment = reduced sound quality. The purpose of the first example was to emphasise that the original mixture should be rated the same as the vocals (because there are no distortions or artefacts), but without further examples, the wording could suggest that one should ignore the accompaniment in their judgement of sound quality, which was not our intent.
Include the original mixture as a second reference, as done in the original PEASS paper.
State that this distortions to the accompaniment should also be considered.
Change wording to overall sound quality, but of course one needs to make it clear that we are referring to the effect of processing artefacts (so relative to the original), rather than 'absolute quality'.
Perhaps the target source isn't actually needed as a reference for this question, the mixture alone should suffice.
There is also nothing wrong with asking for the quality of the singing voice alone (similar to the target preservation task in the PEASS study), but we wanted something more general that was simple for listeners to grasp, without having to conduct further tasks targetting difficult perceptual scales that require extensive training.
The text was updated successfully, but these errors were encountered:
This is something that came up during the experiment.
In general, for task 1, we wanted listeners to rate the overall sound quality of the test sound. That is, even if the separated singing voice (which was the target source for all of the separation algorithms) sounded identical to the reference singing voice, one should also consider the quality of the other instruments in addition to general additive distortions. In this respect, the original mixture should be rated of the highest quality because there are no distortions due to source separation processing.
Two listeners reported that they specifically focused on the singing-voice (vocals) and ignored everything else. They were, however, kind enough to repeat the experiment after clarification.
There are a few ways to overcome this (for next time!):
Emphasise with more sound examples at the familiarisation stage, e.g. reference vocals + highly distorted accompaniment = reduced sound quality. The purpose of the first example was to emphasise that the original mixture should be rated the same as the vocals (because there are no distortions or artefacts), but without further examples, the wording could suggest that one should ignore the accompaniment in their judgement of sound quality, which was not our intent.
Include the original mixture as a second reference, as done in the original PEASS paper.
State that this distortions to the accompaniment should also be considered.
Change wording to overall sound quality, but of course one needs to make it clear that we are referring to the effect of processing artefacts (so relative to the original), rather than 'absolute quality'.
Perhaps the target source isn't actually needed as a reference for this question, the mixture alone should suffice.
There is also nothing wrong with asking for the quality of the singing voice alone (similar to the target preservation task in the PEASS study), but we wanted something more general that was simple for listeners to grasp, without having to conduct further tasks targetting difficult perceptual scales that require extensive training.
The text was updated successfully, but these errors were encountered: