-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to read CSV method #1512
base: master
Are you sure you want to change the base?
Conversation
…ary when files argument is supplied with None
Thanks for the review @matt-graham . Applying changes now |
… doc string, update argument expected datatypes, set one test to initialise files argument with a string
…_method_in_utils' into mnjowe/updates_to_read_csv_files_method_in_utils
files = [f_name.stem for f_name in folder.glob("*.csv")] | ||
return_dict = True | ||
elif isinstance(files, str): | ||
files = [files] | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment this else
clause will be run whenever files
is not a instance list
or str
and is not None
. Looking at the type hint / default value I guess this is meant to cover the case when files
is an instance of an int
? If so it would be better to make this elif isinstance(files, int):
and add a separate else:
clause that raises an error saying files
is not of an expected type if none of the previous clauses have run. At the moment for example if a user passes a non-list iterable data structure as the files
argument - for example a tuple or generator (as returned for example by Path.glob
and Path.rglob
) then the value passed will be silently overwritten which is probably not what they would expect. Failing loudly if files
is of an unexpected type (and/or allowing files
to be of more general iterable types by for example checking if it is an instance of Iterable
from collections.abc
instead of list
) would be better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also avoid the duplication of the list comprehension [f_name.stem for f_name in folder.glob("*.csv")]
by having something like
elif isinstance(files, int) or files is None:
files = [f_name.stem for f_name in folder.glob("*.csv")]
return_dict = files is None
else:
raise TypeError(f"Value passed for files argument {files} is not one of expected types.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thought of combining those two conditions crossed my mind but I was lacking the smart way of setting return_dict
to either True or False. thanks for this @matt-graham
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe using isinstance(files, collections.abc.Iterable)
is not the best approach here, as it also considers strings to be iterable. If a string is included, it could alter the expected behavior of the return type when modelers specify a file name by passing a string. This would cause the method to return a dictionary instead of a DataFrame, which is likely not the intended outcome.
This PR aims to address issue #1509