Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task 1: Sound quality of target or overall sound quality? #60

Open
deeuu opened this issue Apr 2, 2018 · 0 comments
Open

Task 1: Sound quality of target or overall sound quality? #60

deeuu opened this issue Apr 2, 2018 · 0 comments

Comments

@deeuu
Copy link
Member

deeuu commented Apr 2, 2018

This is something that came up during the experiment.

In general, for task 1, we wanted listeners to rate the overall sound quality of the test sound. That is, even if the separated singing voice (which was the target source for all of the separation algorithms) sounded identical to the reference singing voice, one should also consider the quality of the other instruments in addition to general additive distortions. In this respect, the original mixture should be rated of the highest quality because there are no distortions due to source separation processing.

Two listeners reported that they specifically focused on the singing-voice (vocals) and ignored everything else. They were, however, kind enough to repeat the experiment after clarification.

There are a few ways to overcome this (for next time!):

  • Emphasise with more sound examples at the familiarisation stage, e.g. reference vocals + highly distorted accompaniment = reduced sound quality. The purpose of the first example was to emphasise that the original mixture should be rated the same as the vocals (because there are no distortions or artefacts), but without further examples, the wording could suggest that one should ignore the accompaniment in their judgement of sound quality, which was not our intent.

  • Include the original mixture as a second reference, as done in the original PEASS paper.

  • State that this distortions to the accompaniment should also be considered.

  • Change wording to overall sound quality, but of course one needs to make it clear that we are referring to the effect of processing artefacts (so relative to the original), rather than 'absolute quality'.

Perhaps the target source isn't actually needed as a reference for this question, the mixture alone should suffice.

There is also nothing wrong with asking for the quality of the singing voice alone (similar to the target preservation task in the PEASS study), but we wanted something more general that was simple for listeners to grasp, without having to conduct further tasks targetting difficult perceptual scales that require extensive training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants