Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Videos with bak bak paths and incorrect version numbering #1373

Open
susanodd opened this issue Nov 7, 2024 · 8 comments
Open

Videos with bak bak paths and incorrect version numbering #1373

susanodd opened this issue Nov 7, 2024 · 8 comments

Comments

@susanodd
Copy link
Collaborator

susanodd commented Nov 7, 2024

  • This is about files and gloss video objects with "old style" backup paths. These contain sequences of "bak" at the end, in accord with the version number of the gloss video.
  • The code was revised in the past to use "bak" followed by the ID of the gloss video object instead.
  • But some of the old code still uses the obsolete format. This issue is to remove the old format from the code in the creation of new video objects.

The filenames can be fixed by a command in #1412

Reminder of filenames existing in the code:

The new format leaves the "mp4" in the filename. So the old format files with 'bak bak" sequences are missing the video format. I was always "mp4" since we used to use "ensure_mp4" on (Signbank uploads, not API). But we don't know this before because there is still code that mentions the (version * ".bak") suffix. (See #1374)

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 7, 2024

@vanlummelhuizen about "reverse renaming the bak bak files". Do we just assume they are mp4 ?

susanodd pushed a commit that referenced this issue Nov 8, 2024
when they have the wrong bak bak names.

this will end up in a different place in the code. It's not clear how much feedback will be needed or how long it will take to run.
susanodd added a commit that referenced this issue Nov 11, 2024
#1373: Imperative function with side effects to rename video files
susanodd pushed a commit that referenced this issue Nov 13, 2024
susanodd pushed a commit that referenced this issue Nov 13, 2024
susanodd added a commit that referenced this issue Nov 13, 2024
#1373: Command to rename extensions on videos.
@vanlummelhuizen
Copy link
Collaborator

@vanlummelhuizen about "reverse renaming the bak bak files". Do we just assume they are mp4 ?

There are files that are not MP4. The listing below shows the files in glossvideo that end in .bak and do not have the string 'MP4' in the file type.

root@signbank-new:/var/www/writable/glossvideo# find . -type f | grep -P '\.bak$' | xargs -i file {} | grep -v MP4 | less
./NGT/ON/ONE-AND-A-HALF-B-40012.bak17345.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/ON/ONE-AND-A-HALF-B-40012.bak17346.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/ON/ONE-AND-A-HALF-B-40012.bak.bak.bak.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/ON/ONE-AND-A-HALF-B-40012.bak17344.bak.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/BL/BLIKJE-A-36667.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/BA/BACTERIE-A-40006.bak13572.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/te/testlemmaidglosstranslation6-3729.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./NGT/te/testlemmaidglosstranslation74-2793.bak.bak: ISO Media, Apple iTunes Video (.M4V) Video
./CSL_Shanghai/LA/LAUNDRY-MACHINE-A-6153.mp4.bak: ISO Media, Apple iTunes Video (.M4V) Video

However, when I search for them in the database, they don´t seem to belong to a GlossVideo object:

>>> files = [
... "glossvideo/NGT/ON/ONE-AND-A-HALF-B-40012.bak17345.bak.bak",
... "glossvideo/NGT/ON/ONE-AND-A-HALF-B-40012.bak17346.bak",
... "glossvideo/NGT/ON/ONE-AND-A-HALF-B-40012.bak.bak.bak.bak.bak",
... "glossvideo/NGT/ON/ONE-AND-A-HALF-B-40012.bak17344.bak.bak.bak",
... "glossvideo/NGT/BL/BLIKJE-A-36667.bak.bak",
... "glossvideo/NGT/BA/BACTERIE-A-40006.bak13572.bak",
... "glossvideo/NGT/te/testlemmaidglosstranslation6-3729.bak.bak",
... "glossvideo/NGT/te/testlemmaidglosstranslation74-2793.bak.bak",
... "glossvideo/CSL_Shanghai/LA/LAUNDRY-MACHINE-A-6153.mp4.bak"
... ]
>>> print(", ".join([str(GlossVideo.objects.filter(videofile=file).count()) for file in files]))
0, 0, 0, 0, 0, 0, 0, 0, 0

So, the current state is that all files in glossvideo for which an GlossVideo object exists are MP4. But I don't think it is guaranteed that it will always be that way.

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 13, 2024

Whoa! It made some really weird file names there!

extrra bak baks after the new extension

There is video code that still uses "bak bak". But I thought it was being circumvented.

non-mp4

Okay, that is what I was afraid of. That some of the bak bak files might be totally different extensions.

I tried converting some off-line and that works. So probably a command is needed to check the format of the files and convert them if necessary.

It's possible that many of the backup files are the wrong format. That would be a normal reason for users to upload again.

@vanlummelhuizen
Copy link
Collaborator

I tried converting some off-line and that works. So probably a command is needed to check the format of the files and convert them if necessary.

Converting files currently in glossvideo? As said, all files that are nog MP4 don't have a corresponding GlossVideo object, so converting is not necessary.

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 14, 2024

@susanodd Why are there two very similar command script to rename backed up glossvideo files? :

And what does https://github.com/Signbank/Global-signbank/blob/master/signbank/dictionary/management/commands/rename_non_mp4_extensions.py do?

Are they tested, reviewed? Did you already use them on the server?

[THIS GOT A BIT LONG]

They are tested. But only locally. We don't have video files on the development servers.

The paths were going wrong. I did tests first to see what the "move" command would do.
(The new format still has the "mp4" before the "bakNNN" so it actually has two extensions. I did not test it properly first and didn't notice it has the extra "mp4" inside the path. I wasn't sure if the "split" command on the path was getting the right parts. So then I changed it to construct what the path should be instead of manipulating the existing stored path. It seems it was sometimes not getting a relative path, but just the filename. That might work differently on the Apache server versus the PyCharm runserver, since files are not actually being served locally.)

The rename for the extensions, that was only on a handful of files. I have the log script.

Those need to be converted. It seems to be browser specific. The files actually display, if you type in the url for them using protected_media, even if the extension was changed. (At least on Apple and on Ubuntu.)

The javascript code for drag and drop restricts the type of the video files, so they don't display in Gloss Detail.

There were some problems before because the webcam format on Ubuntu/Apple/Dell (the computer from @Jetske) does not work on the "other" system. So some formats were excluded in video display. We had conversion for a while, but the API did not want that anymore. The "image" display was fixed. It was not including 'png' before, that's why the images were not showing.

The files are all in the right place with the right name now. But the format needs to be converted on ones with non-mp4 format in an mp4 named file.

Those weird files with extra bak sequences after the good bakNNNN extension need to be removed or renamed and objects created. (Removed you wrote.)

None of the commands add "bak bak" to the videos. So those files already existed. The commands were only renaming the backup files.

I'm working on the conversion part. That needs to be done with ffmpeg.
I don't know why the original "ensure_mp4" stopped working. It was commented out. This was asked for for the API. Some of those videos also have the wrong format.
Oh, I remembered. It did not work on ones that were webcam. It ended up that the "input" video ended up having a different frame rate than the "output" video so its length had been changed. Then it failed for some reason. (@Jetske will know the details.)

@susanodd
Copy link
Collaborator Author

@vanlummelhuizen all of the "renamed" non-mp4 files have been converted to real mp4 files (offline, using ffmpeg).

@susanodd
Copy link
Collaborator Author

susanodd commented Nov 19, 2024

TO DO: Convert format of non-mp4 files. Those that used to have "bak bak" sequences did not have any video extensions on them. Apparently it was assumed everything was converted using "ensure_mp4".

TO DO: Add a column to the DeleteGlossOrMedia of Dictionary Admin to show whether a file exists for the video of a deleted gloss. The table does not include the dataset. This could be obtained by checking what folder the file is in. On occasion the users ask to retrieve deleted video files.

@susanodd susanodd changed the title Videos with bak bak paths Videos with bak bak paths and incorrect version numbering Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants