Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Videos randomly disappearing #1453

Open
rem0g opened this issue Dec 23, 2024 · 22 comments
Open

Videos randomly disappearing #1453

rem0g opened this issue Dec 23, 2024 · 22 comments

Comments

@rem0g
Copy link
Collaborator

rem0g commented Dec 23, 2024

Some glosses has disappeared videos for example:

https://signbank.cls.ru.nl/dictionary/gloss/47361

https://signbank.cls.ru.nl/dictionary/gloss/47475

Some glosses has wrong still image from the video.

Also some glosses has wrong video perception.

Some glosses even has NMM video as gloss video.

This is happening everywhere, I have checked my scripts and everything looks fine at my end.

@rem0g
Copy link
Collaborator Author

rem0g commented Dec 23, 2024

Another example:

https://signbank.cls.ru.nl//dictionary/gloss/47336

When glos video has been deleted, then NME video has moved from NME to glos.

@rem0g
Copy link
Collaborator Author

rem0g commented Dec 23, 2024

At Glos AFBROKKELEN (https://signbank.cls.ru.nl//dictionary/gloss/46883), I am seeing this:

  {
                "ID": "211108",
                "Index": "0",
                "Description: Dutch": "afbrokkelen",
                "Description: English": "crumble",
                "Link": "https://signbank.cls.ru.nl//dictionary/protected_media/glossvideo/NGT/AF/AFBROKKELEN-46883.mp4"
            },
            {
                "ID": "208048",
                "Index": "1",
                "Description: Dutch": "afbrokkelen",
                "Description: English": "crumble",
                "Link": "https://signbank.cls.ru.nl//dictionary/protected_media/glossvideo/NGT/AF/AFBROKKELEN-46883.mp4"
            }
            
            
            Both has same link and that should not be the case. 

@susanodd
Copy link
Collaborator

@rem0g the university is closed this week and next. We won't be able to work on this until then.
We are all on vacation now.

@susanodd
Copy link
Collaborator

Can you increase the amount if time between updating/uploading and retrieving?

Like maybe increase it to 10 minutes? There was previously a problem that the time between operations was too frequent.
With videos in could be that a previous operation was not completed before a new one is performed. (It makes a difference if the file system is continuously writing new files. The transactions to create objects are faster than the file system.

@susanodd
Copy link
Collaborator

I had this problem locally on my own computer using iCloud for storage.
But I could not repeat it on the real server.
It was that the files had not been completely copied to iCloud. So they "disappeared" like that. They were actually on disc, but had names with "." before the filename, so not visible. This was on Apple's Unix.

@rem0g
Copy link
Collaborator Author

rem0g commented Dec 23, 2024

Yes I could do that but i would rather for Signbank side do something about the transactions, for example for incoming transactions I always enqueue them in a list instead of executing them immediately. When the server has finished processing a certain transaction then it could handle the next one.

Would that be something you can work on?

@susanodd
Copy link
Collaborator

@rem0g that sounds like an interesting approach. I will discuss that with @vanlummelhuizen how to implement that. He is the Django expert. A queuing mechanism.

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

Another example:

https://signbank.cls.ru.nl//dictionary/gloss/47336

When glos video has been deleted, then NME video has moved from NME to glos.

@rem0g How are you deleting the video?

(There are some "signals" when objects are deleted. These may or may not move or delete video files. It would help in debugging to know what commands have been done.)
(When a gloss is deleted, the video files are not deleted.)

Theoretically, if you are uploading video files at rapid speed, the temporary files (that Unix is making) could end up being linked to the wrong object in Django. I suspect this for a long time, but cannot fix this myself. I will ask the others.)

I implemented a lot of code in November/December for managing the video files. There are pull requests for these. But nobody has reviewed them yet. The intention is that the dataset manager can inspect what is in the file system. That also allows to retrieve the videos from deleted glosses. The gloss IDs are not reused, so new videos should not have any interference with deleted glosses, since they always have the ID in the filename and these are not reused.)

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

@rem0g for this gloss:

https://signbank.cls.ru.nl/dictionary/gloss/47361

the NME video is not in the correct format!

(On Firefox, it shows that it is not supported.)

Recall that you asked us to not test for MP4 anymore. Thus it can be that incorrect formats are causing problems.
(The images cannot be generated for videos in the wrong format. Hence, an old image will remain shown.)

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

@rem0g here are the gloss video objects for AFBROKKELEN, yes, you can see that the same filename appears multiple times, for different GlossVideo objects with various perspectives and NME set.

afbrokkelen-46883-glossvideo-objects

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

Is it possible something was wrong with the permissions on your source file? Or that it was a symbolic link?
It looks like something was wrong with the source file that it kept being bound to different objects.
Also, all of the files attached to the objects have very similar time stamps.

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

@vanlummelhuizen can you help on this issue?

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

@rem0g there are hundreds of videos with the wrong filename as you point out.

Did you change anything in your script?

I copied the newest database to signbank-test in order to inspect the filenames (in the objects).

https://signbank-test.cls.ru.nl/datasets/checks/5

(There are no files, but you can see the filenames in the objects)

The last time I checked filenames (end of November) everything was as expected.
It may have started when you can upload three videos at the same time in the API?

The problem is that all the gloss video objects of a gloss are indeed sharing a single video file. They are all pointing to the same file.

I can only think that this is being caused by an alias or something. That the file system is pointing to a single file during upload.

Babbling, but I know Django does not allow to upload multiple files in the Django Form Template. (We used to do this for the eaf files in the Dataset Manager, but when we updated to Django 4.2, the code had to be modified to only upload one file.)

Perhaps Django is somehow doing something here since multiple video files are in the same API request. (The Django feature was removed for security reasons from Django.)

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

From Django manual

Together MemoryFileUploadHandler and TemporaryFileUploadHandler provide Django’s default file upload behavior of reading small files into memory and large ones onto disk.

You can write custom handlers that customize how Django handles files. You could, for example, use custom handlers to enforce user-level quotas, compress data on the fly, render progress bars, and even send data to another storage location directly without storing it locally. See Writing custom upload handlers for details on how you can customize or completely replace upload behavior.
Where uploaded data is stored

Before you save uploaded files, the data needs to be stored somewhere.

By default, if an uploaded file is smaller than 2.5 megabytes, Django will hold the entire contents of the upload in memory. This means that saving the file involves only a read from memory and a write to disk and thus is very fast.

However, if an uploaded file is too large, Django will write the uploaded file to a temporary file stored in your system’s temporary directory. On a Unix-like platform this means you can expect Django to generate a file called something like /tmp/tmpzfp6I6.upload. If an upload is large enough, you can watch this file grow in size as Django streams the data onto disk.

These specifics – 2.5 megabytes; /tmp; etc. – are “reasonable defaults” which can be customized as described in the next section.

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

I need help with this issue.

One of the glosses that is messed up is STRUISVOGEL

Here, you see all the objects refer to the same file (the perspective videos have the same file name as the normal video):

STRUISVOGEL-perspective-videos-admin

Here you see a stats for the file:

stat ../writable/glossvideo/NGT/ST/STRUISVOGEL-45874.mp4
  File: ../writable/glossvideo/NGT/ST/STRUISVOGEL-45874.mp4
  Size: 1105797   	Blocks: 2158       IO Block: 131072 regular file
Device: 20007dh/2097277d	Inode: 864571      Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/  ubuntu)   Gid: ( 1001/wwwsignbank)
Access: 2025-01-06 08:10:59.506318499 +0000
Modify: 2024-12-01 05:01:46.404138548 +0000
Change: 2024-12-20 13:58:20.693990022 +0000
 Birth: 2024-12-01 05:01:46.386138096 +0000

But in the file system, the timestamp on the file shows that it was not changed on 20 december.

-rw-r--r-- 1 ubuntu wwwsignbank 1105797 Dec 1 05:01 STRUISVOGEL-45874.mp4

@susanodd
Copy link
Collaborator

susanodd commented Jan 6, 2025

I don't know how to solve this.

Hopefully, @vanlummelhuizen can have a go.

@rem0g
Copy link
Collaborator Author

rem0g commented Jan 6, 2025

Hello, thank you for looking into the issue. For me the issue is not clear yet but now I know it's not caused by my script as everything relies on Signbank API. For uploading videos i do use this API:

/dictionary/api_update_gloss/{glossid}/video

That's it.

As for NME video upload i do use:

/dictionary/api_create_gloss_nmevideo/{datasetid}/{glossid}/}

And for deleting NME video:

/dictionary/api_delete_gloss_nmevideo/{datasetid}/{glossid}/{videoid}/

For every time if i want to upload a certain NME video i do execute api_delete_glos_nmevideo first, but for that i obtain unique ID from the nme video and then delete that and then upload the new NME video.

@uklomp
Copy link
Collaborator

uklomp commented Jan 7, 2025

Hi all, I added 'blocking' to indicate extra extra priority. If there's something on our end we can do, please let us know.

@susanodd
Copy link
Collaborator

susanodd commented Jan 7, 2025

I'm not able to solve this myself. I am aware of this problem for a long time, but it was on a local server running on iCloud. So I chalked that up to Apple quirks.

There are quite a few messed up video objects/ files now.

I'm playing with this locally, so I can inspect the file system and the admin without messing up anything.

Since multiple video objects refer to exactly the same file, it is not possible to "fix" this, other than to delete objects that point to the wrong file. But to do this, we need to turn off the "normal" process of deleting, otherwise the "correct" file may be deleted.

I made a command (pull request) for renaming backup video files, since that has been messed up for a long time.
But this is something different, since objects refer to the same file.

@susanodd
Copy link
Collaborator

susanodd commented Jan 7, 2025

@rem0g can you stop deleting the videos on your side? Because objects are referring to the same file, this is causing a shared file to be deleted.

I still think this is due to the API commands happening too fast for the file system. But once a file ends up shared by different objects, that escalates the problem, domino effect.

@susanodd
Copy link
Collaborator

susanodd commented Jan 7, 2025

@rem0g I put the database (per yesterday) onto signbank-susan where I have a pull request branch running with extra commands for the admin. (For filtering on NME, Perspective, and wrong filename).

Out of curiosity, I looked at the gloss video history. Here you can see the names of the files that were uploaded. The source file and the (desired) target file.

It looks like your naming convention with L, M, R in the filenames, that you are also using M for videos that are NME videos.

glossvideohistory_recent

@susanodd
Copy link
Collaborator

susanodd commented Jan 8, 2025

I have a copy of the database on signbank-susan. (Per Monday. No videos, but you can browse the video objects / filenames / paths) in the admin.

I am working on filters to enhance the admin in order to detect and fix video problems. #1398

I added filters for NME and Perspective, plus a filter for "wrong filename".

Now it's possible to query on those wrong filenames.

For NME videos with the wrong filename, there are 1366 results!!!!!

For Perspective videos with the wrong filename, there are 4688 results!!

@Woseseltops @vanlummelhuizen this is a huge problem.

Since multiple objects are referring to the same video file (see the examples above), it is not possible to simply erase or delete anything.

I can make an "unlink file from object" command in order to uncouple the link. Then it would be possible to delete NME and Perspective objects that are pointing to the normal video file. (The files need to be either deleted or unlinked in order to delete the objects. But this runs the risk of deleting files that should not be deleted.) Moreover, when you do delete an object, this sets in motion lots of "signal" commands that move around backup video files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants