-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix the merge audio dirs job for fairseq training #454
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: michelwi <[email protected]>
os.symlink(file, dst) | ||
creation_complete = True | ||
except OSError as err: | ||
if err.errno != errno.EEXIST: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would there be an OSError
with errno.EEXIST
(i.e. FileExistsError
)? You have just completed the while loop and this implies that it did not exist.
Oh, you are running several tasks in parallel and there could be race conditions. if folder1/a.wav
is in self.audio_dir_paths[0]
and folder2/a.wav
is in self.audio_dir_paths[1]
. But then the correct solution is not to ignore the error but make a thread save implementation.... which I now realize is the purpose of the double while loop..
Ok. nevermind my previous comment then. (But do we really need parallelism for symlinking a bunch of files?)
Except for the issue with the ever expanding underscores. That should probably be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the issue for the ever expanding underscores, sorry I couldn't find the relevant comments, could you please refer me to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while os.path.exists(dst):
dst = f"{os.path.splitext(dst)[0]}_{i}.{self.file_extension}"
this will keep appending underscores to the filename:
a.wav
a_2.wav
a_2_3.wav
...
With the current version, if file name from two directories are the same, there will be
file already exists error
when doing the symlinkos.symlink(file, dst)
since the dst are the sameThis PR fixes this issue by renaming dst for the second file