Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to read CSV method #1512

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

mnjowe
Copy link
Collaborator

@mnjowe mnjowe commented Nov 12, 2024

This PR aims to address issue #1509

…ary when files argument is supplied with None
@mnjowe mnjowe self-assigned this Nov 12, 2024
@mnjowe mnjowe linked an issue Nov 12, 2024 that may be closed by this pull request
3 tasks
src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
@mnjowe
Copy link
Collaborator Author

mnjowe commented Nov 18, 2024

Thanks for the review @matt-graham . Applying changes now

src/tlo/util.py Outdated Show resolved Hide resolved
files = [f_name.stem for f_name in folder.glob("*.csv")]
return_dict = True
elif isinstance(files, str):
files = [files]
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment this else clause will be run whenever files is not a instance list or str and is not None. Looking at the type hint / default value I guess this is meant to cover the case when files is an instance of an int? If so it would be better to make this elif isinstance(files, int): and add a separate else: clause that raises an error saying files is not of an expected type if none of the previous clauses have run. At the moment for example if a user passes a non-list iterable data structure as the files argument - for example a tuple or generator (as returned for example by Path.glob and Path.rglob) then the value passed will be silently overwritten which is probably not what they would expect. Failing loudly if files is of an unexpected type (and/or allowing files to be of more general iterable types by for example checking if it is an instance of Iterable from collections.abc instead of list) would be better here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also avoid the duplication of the list comprehension [f_name.stem for f_name in folder.glob("*.csv")] by having something like

elif isinstance(files, int) or files is None:
    files = [f_name.stem for f_name in folder.glob("*.csv")]
    return_dict = files is None
else:
    raise TypeError(f"Value passed for files argument {files} is not one of expected types.")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought of combining those two conditions crossed my mind but I was lacking the smart way of setting return_dict to either True or False. thanks for this @matt-graham

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe using isinstance(files, collections.abc.Iterable) is not the best approach here, as it also considers strings to be iterable. If a string is included, it could alter the expected behavior of the return type when modelers specify a file name by passing a string. This would cause the method to return a dictionary instead of a DataFrame, which is likely not the intended outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

Update read cvs files method in util
3 participants