Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test backward compatibility #413

Closed
wants to merge 5 commits into from
Closed

Conversation

hagenw
Copy link
Member

@hagenw hagenw commented May 16, 2024

...

@hagenw hagenw marked this pull request as draft May 16, 2024 15:01
@hagenw
Copy link
Member Author

hagenw commented May 16, 2024

@ChristianGeng this is funny.
I try here to test for all the errors we have seen when loading older dependency tables from cache (ModuleNotFoundError under Windows, KeyError under Python 3.8). I created PKL files from the emodb dependency table with different Python and different audb versions and stored them under tests/assests/dependency-table/. But funnily, I'm not able to raise any of the observed errors.

@ChristianGeng
Copy link
Member

@ChristianGeng this is funny. I try here to test for all the errors we have seen when loading older dependency tables from cache (ModuleNotFoundError under Windows, KeyError under Python 3.8). I created PKL files from the emodb dependency table with different Python and different audb versions and stored them under tests/assests/dependency-table/. But funnily, I'm not able to raise any of the observed errors.

I tried to investigate the error that you are getting on windows via google.

I find this thread interesting (despite both come trhough duckdb):

duckdb/duckdb#8735
chroma-core/chroma#1069

It basically says that

  • something with importing the dtype module is going wrong (that can possibly be avoided via code)
  • pandas incompatibilities (2.1.0 "breaks something"). Accordingly a pandas downgrade is sometimes recommended.
  • The weirdest thing is that one guy reports running the same code twice, claiming it running the second time.

None of these however relate this to the windows platform.
Still, I could imagine that this is simply some weird incompatibility thing that in the windows case that we have strikes us. I also cannot find any code locations where iaudb would import dtype directly. Also when type hinting I cannot find suspicious code locations.

So I am also very startled what is going on here.

@hagenw
Copy link
Member Author

hagenw commented May 17, 2024

Regarding the code location, I'm very confident that this is the affected code:

audb/audb/core/api.py

Lines 271 to 279 in 44df511

try:
deps = Dependencies()
deps.load(cached_deps_file)
except (AttributeError, EOFError, FileNotFoundError, KeyError, ValueError):
# If loading cached file fails, load again from backend
backend_interface = utils.lookup_backend(name, version)
deps = download_dependencies(backend_interface, name, version, verbose)
# Store as pickle in cache
deps.save(cached_deps_file)

So if we simply would catch Exception we would avoid the error.
Maybe we can change the code to not load the dependency table if a user Interrupts:

        try:
            deps = Dependencies()
            deps.load(cached_deps_file)
        except KeyboardInterrupt:
            raise
        except Exception:
            # If loading cached file fails, load again from backend.
            # As loading of not compatible pickle files
            # results in a variety of possible errors,
            # we except all besides keyboard interruption
            backend_interface = utils.lookup_backend(name, version)
            deps = download_dependencies(backend_interface, name, version, verbose)
            # Store as pickle in cache
            deps.save(cached_deps_file)

@ChristianGeng
Copy link
Member

So if we simply would catch Exception we would avoid the error. Maybe we can change the code to not load the dependency table if a user Interrupts:

Probably then it will be hard to avoid a "catch-all". It is always a bit worrying ...

@hagenw
Copy link
Member Author

hagenw commented May 17, 2024

At least if there is really another depedency file related error, it should be raised by the other code lines as well.

I think, I'm more worried that audb fails loading something, and the user has no clue what to do.

@ChristianGeng
Copy link
Member

At least if there is really another depedency file related error, it should be raised by the other code lines as well.

I think, I'm more worried that audb fails loading something, and the user has no clue what to do.

Not sure whether I get it.
So you are talking about the situtation when the exception is raised? But then the other code lines themselves would raise as you mention, saying tht this should be raised by the other code lines as well.

Or is your worry about many problems after a release?

@hagenw
Copy link
Member Author

hagenw commented May 17, 2024

But then the other code lines themselves would raise as you mention

No, they would not if the exception is raised due to a broken/non-compatible cache file, as the cache file is overwritten.
The error would only be raised then, if it is related to the content of the dependency table, but not due to a broken/non-compatible cache.

@hagenw
Copy link
Member Author

hagenw commented May 17, 2024

As stated in audeering/audinterface#172 (comment), the actual problem regarding the failed Windows test might not be a backward compatibility issue, but an across platform compatibility issue when using the newest version of audb on all platforms.
We should try to extend the test here with a cross-platform test using the newest version of audb to create the cache file.

@audeerington
Copy link

I ran into both the ModuleNotFoundError and KeyError when trying to load a database that already had an existing version stored, and then found this PR.

I think it isn't a cross-platform issue, but a compatibility issue with different pandas versions.
I tried creating different versions of the db.pkl from the db.csv similar to what was added in d5402d6 , using always audb==1.7.2 but alternating the pandas version. Then I loaded the db.pkl with pd.read_pickle().

This replicated the error:

  • db.pkl created with pandas==2.0.3. Loading with pandas==2.2.2 leads to a ModuleNotFoundError
  • db.pkl created with pandas==2.2.1 or pandas==2.2.2. Loading with pandas==2.0.3 leads to a KeyError

Maybe the problem just manifested for Windows because it happened to install a different pandas version between tests, but for the other platforms the same pandas version was used.

@hagenw
Copy link
Member Author

hagenw commented May 27, 2024

Thanks for reporting this.

Would be nice if it is not a cross-platform issue. As then we can solve it by reloading the database and storing again when we experience an error.

I will try to update the tests here to replicate your findings.

@hagenw
Copy link
Member Author

hagenw commented May 28, 2024

Closing in favor of #417 and #418

@hagenw hagenw closed this May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants