Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the number of identical get_two_letter_dir functions to one #1390

Closed
vanlummelhuizen opened this issue Nov 13, 2024 · 15 comments
Closed

Comments

@vanlummelhuizen
Copy link
Collaborator

I noticed there are six identical get_two_letter_dir functions, see details below. This is a big violation of the Don't Repeat Yourself (DRY) principle in software development. We should get rid of five of them and avoid copying+pasting code or even rewriting the same thing in the future.

❯ git grep -A 6 "def get_two_letter"
signbank/dataset_checks.py:def get_two_letter_dir(idgloss):
signbank/dataset_checks.py-    foldername = idgloss[:2]
signbank/dataset_checks.py-
signbank/dataset_checks.py-    if len(foldername) == 1:
signbank/dataset_checks.py-        foldername += '-'
signbank/dataset_checks.py-
signbank/dataset_checks.py-    return foldername
--
signbank/dictionary/management/commands/rename_backup_gloss_videos.py:def get_two_letter_dir(idgloss):
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-    foldername = idgloss[:2]
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-    if len(foldername) == 1:
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-        foldername += '-'
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-
signbank/dictionary/management/commands/rename_backup_gloss_videos.py-    return foldername
--
signbank/dictionary/management/commands/rename_non_mp4_extensions.py:def get_two_letter_dir(idgloss):
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-    foldername = idgloss[:2]
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-    if len(foldername) == 1:
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-        foldername += '-'
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-
signbank/dictionary/management/commands/rename_non_mp4_extensions.py-    return foldername
--
signbank/tools.py:def get_two_letter_dir(idgloss):
signbank/tools.py-    foldername = idgloss[:2]
signbank/tools.py-
signbank/tools.py-    if len(foldername) == 1:
signbank/tools.py-        foldername += '-'
signbank/tools.py-
signbank/tools.py-    return foldername
--
signbank/video/models.py:def get_two_letter_dir(idgloss):
signbank/video/models.py-    foldername = idgloss[:2]
signbank/video/models.py-
signbank/video/models.py-    if len(foldername) == 1:
signbank/video/models.py-        foldername += '-'
signbank/video/models.py-
signbank/video/models.py-    return foldername
--
signbank/zip_interface.py:def get_two_letter_dir(idgloss):
signbank/zip_interface.py-    foldername = idgloss[:2]
signbank/zip_interface.py-
signbank/zip_interface.py-    if len(foldername) == 1:
signbank/zip_interface.py-        foldername += '-'
signbank/zip_interface.py-
signbank/zip_interface.py-    return foldername
@susanodd
Copy link
Collaborator

susanodd commented Nov 14, 2024

This had something to do with it being needed inside "models" where it can't import from the other files.
As well as circular references because it was already inside "video/models" as well as inside of "tools".
I added it explicitly because the imports were going wrong. One place it's just coded in-line.

The three "commands" are going to be thrown away, basically, once it's all converted. That's why I put them there, to not import other code.

It's also violating other principles because the entire file system is visible inside the models.

@vanlummelhuizen
Copy link
Collaborator Author

This had something to do with it being needed inside "models" where it can't import from the other files.
As well as circular references because it was already inside "video/models" as well as inside of "tools".
I added it explicitly because the imports were going wrong. One place it's just coded in-line.

To circumvent circular imports, you could do the import within a function, class or method. See
get_two_letter_dir.patch.txt

Can you see whether this works on your side? git apply <patch-file>

@susanodd
Copy link
Collaborator

susanodd commented Nov 14, 2024

Can we make a totally new "file system related functions" file to put it in? The problem is that it also needs to use "idgloss" but that function is full of try-except clauses. So it could be failing internally yet we don't know. The glosses where video's are being uploaded are supposed to have lemma's, dataset's, lemma translations. So this should not be failing internally. I made a separate issue a few weeks ago about the idgloss property.

Because there were unexplained failings in the API I was trying to localize the code to be able to debug it.

If the file system related methods can be put in a file where other functions are not imported from that would help.
The problem with "tools" is that it's become full of numerous other functions.

The "database_checks" file was made to be independent because it takes a really long time to run. I assumed it would reduce the runtime if it didn't import the function. Although it makes use of idgloss. So that doesn't really solve anything.

I think the idgloss property should fail if the gloss is missing a dataset or lemma or lemma translation. (But that maks for awful code elsewhere.)

The entire video upload system is using this function constantly.
Plus the API for a non-zipped file is now making use of the backup system too. (So there may be new files with bak bak at the end being created if that is not solved.)

I merged the renaming of the backup files because it's in use and it's kind of high priority.

I agree completely about the 2-char function. But it's kind of in combination with idgloss. (And then that gets worse with all the right to left and pictogram characters.)

In some of the code (not reviewed) I started making the "desired relative path". In order to compare to see if something was messed up
There is a lot of inconsistency in the videofile access. Some uses of path, versus, name, versus url.
But those are not necessary available. Plus the original "upload_to" used to just be "glossvideo" without any dataset.
I don't know whether those can be "frozen" inside of old objects. (?)

susanodd pushed a commit that referenced this issue Nov 20, 2024
@susanodd
Copy link
Collaborator

I removed the three commands that had included the duplicated code.

@vanlummelhuizen
Copy link
Collaborator Author

Are they not necessary for work on the live server?

@susanodd

This comment was marked as outdated.

susanodd added a commit that referenced this issue Dec 3, 2024
Function to retrieve video content type of video file. Added here to avoid duplicating it in multiple files on multiple branches
@susanodd
Copy link
Collaborator

susanodd commented Dec 10, 2024

@vanlummelhuizen I just tried this in the zip_interface.py file. PyCharm doe not like this:

django.template.library.InvalidTemplateLibrary: Invalid template library specified. ImportError raised when trying to load 'signbank.dictionary.templatetags.annotation_idgloss_translation': cannot import name 'get_two_letter_dir' from partially initialized module 'signbank.tools' (most likely due to a circular import) (/Users/susaneven/Documents/GitHub/Global-signbank/signbank/tools.py)

The get_two_letter_dir is the ONLY function being imported from tools.

Nothing is importing signbank.dictionary.templatetags.annotation_idgloss_translation here.

The circular references are not just this 'get_two_letter_dir' function.
I added it to the zip interface because of circular references in OTHER code....

susanodd added a commit that referenced this issue Dec 10, 2024
PyCharm was flagging this as an error
@susanodd
Copy link
Collaborator

susanodd commented Dec 10, 2024

Are they not necessary for work on the live server?

I have been creating ADMIN commands to fix filenames. (See other pull requests.)

This has the advantage that it is very targeted.

@vanlummelhuizen
Copy link
Collaborator Author

(See other pull requests.)

Which pull requests exactly? Please always be specific so we don't have to search and/or guess what you mean.

@susanodd
Copy link
Collaborator

susanodd commented Dec 10, 2024

(See other pull requests.)

Which pull requests exactly? Please always be specific so we don't have to search and/or guess what you mean.

#1398 and #1412

The 'dataset manager' available page was added later. The admin is kind of obtuse sometimes.

I want to make the functionality of the two ("admin" and "manager") more similar.

The "manage" page you see the result of the command immediately in the list view. (On signbank-test. It also changes the filenames in the objects when there is no video file. On the real server, it would change the filename as well. It can only inspect whether the type of video matches the extension if there is a real video though, of course.)

(More buttons can be added, or a search functionality. This started out as a convenience since things were going wrong with the API video upload.)

@vanlummelhuizen
Copy link
Collaborator Author

Circular import of get_two_letter_dir in tools.py:

  File "/home/micha/docs/ru/signbank/global-signbank/repo-global-signbank/signbank/dictionary/templatetags/annotation_idgloss_translation.py", line 3, in <module>
    from signbank.tools import get_default_annotationidglosstranslation
  File "/home/micha/docs/ru/signbank/global-signbank/repo-global-signbank/signbank/tools.py", line 37, in <module>
    from signbank.api_interface import api_fields
  File "/home/micha/docs/ru/signbank/global-signbank/repo-global-signbank/signbank/api_interface.py", line 36, in <module>
    from signbank.zip_interface import *
  File "/home/micha/docs/ru/signbank/global-signbank/repo-global-signbank/signbank/zip_interface.py", line 26, in <module>
    from signbank.tools import get_two_letter_dir
ImportError: cannot import name 'get_two_letter_dir' from partially initialized module 'signbank.tools' (most likely due to a circular import) (/home/micha/docs/ru/signbank/global-signbank/repo-global-signbank/signbank/tools.py)

Solution (probably more like a work-around): import get_two_letter_dir in the function that uses it.

Patch for the solution
diff --git a/signbank/zip_interface.py b/signbank/zip_interface.py
index 4a739866..286e8ed2 100644
--- a/signbank/zip_interface.py
+++ b/signbank/zip_interface.py
@@ -213,16 +213,8 @@ def uploaded_zip_archives(dataset):
     return zip_archive
 
 
-def get_two_letter_dir(idgloss):
-    foldername = idgloss[:2]
-
-    if len(foldername) == 1:
-        foldername += '-'
-
-    return foldername
-
-
 def get_gloss_filepath(video_file_path, gloss):
+    from signbank.tools import get_two_letter_dir
 
     filename = os.path.basename(video_file_path)
     filename_without_extension, _ = os.path.splitext(filename)
@@ -268,6 +260,7 @@ def get_gloss_filepath(video_file_path, gloss):
 
 
 def get_gloss_filepath_glossid(video_file_path, gloss):
+    from signbank.tools import get_two_letter_dir
 
     filename = os.path.basename(video_file_path)
     filename_without_extension, _ = os.path.splitext(filename)

@susanodd
Copy link
Collaborator

In the past we had our own imposed guidelines that ll the imports should be at the top. (Rule of either you or @Woseseltops)

susanodd added a commit that referenced this issue Dec 10, 2024
#1390: Removed syntax error from very old command
@susanodd
Copy link
Collaborator

@vanlummelhuizen there are some bugs in this file:

signbank.dictionary.templatetags.annotation_idgloss_translation

some of the functions don't work if a language is missing from a gloss.

(I'm trying to browse dataset LSFB and I end up getting errors from this file. I can't actually get this dataset to be set in the signs/search form. It keeps setting it to NGT. Even though LSFB is set in the signbank banner. Somehow it's getting set back to NGT)

(Because it's being set to NGT, then the language fields don't match the form fields.)

@susanodd
Copy link
Collaborator

This might have something to do with the type of selected_datasets. When it's in the templates or session variables, then it's acronyms. But some code assumes this is dataset objects.

@vanlummelhuizen
Copy link
Collaborator Author

This issue is finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants