-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow path-like objects in fnmatch.fnmatchcase
#123215
Comments
We talked about this in typeshed recently too: python/typeshed#12522 (comment) and later comments. I don't feel too strongly but it might be clearer to keep fnmatch limited to strings instead. |
Unfortunately, they are not limited to strings at runtime since they are calling I'm marking the other as DO-NOT-MERGE until we decide whether to accept or not (what's important is that the module is consistent on different platforms). |
Earlier comments too ;) python/typeshed#12522 (comment) |
By reading the discussion, I feel we should explicitly reject path-like objects in @barneygale Let's discuss this matter here. Should we rather explicitly reject path-like objects (even on Windows platforms where In order to keep the current behaviour, possibly restricting it at runtime but making it in sync with typeshed, I think we should warn if a non-str/bytes filename or pattern is given (at least on platforms that implicitly call os.fspath due to os.path.normcase) and then raise in Python 3.16 (directly raise in 3.14?) |
That all makes sense to me! I suppose |
It seems we reached a consensus. Unfortunately, I'm conflicted about eagerly checking the types in for name in names:
if not isinstance(name, (str, bytes)): # this may be costly
raise_or_warn()
do_the_rest() So, I think we could also keep the status quo by not changing anything for now? I can make the docs a bit more precise and eager so that users really know not to pass path-like objects. How does it sound? |
I think adding a strict type check would also require a deprecation period. (Although passing PathLike objects is bug-prone, that doesn't mean that people aren't depending on it — and the people depending on it might not be suffering from any of the possible bugs.) So I would also vote for leaving things as they are. I don't think a deprecation period would be worth it, and the standard library generally does not do strict pre-emptive type checks along these lines. I'd be happy with a docs clarification, but I also don't think it necessarily deserves that much space in the docs. I think the language in the docs where it talks about "file names" is already reasonably clear, in my opinion :-) |
I'll just pass through the docs to see whether they require better clarification or not. But let's keep the status quo for now. At least, with this issue, we will have a trace of why we did not consider (yet) path-like objects. |
Before opening a PR, when we say "filename string" do we mean a string in the sense of a
The overloaded signatures are: def fnmatch(name: str, pat: str) -> bool: ...
def fnmatch(name: bytes, pat: bytes) -> bool: ... but where @functools.lru_cache(maxsize=32768, typed=True)
def _compile_pattern(pat):
if isinstance(pat, bytes):
pat_str = str(pat, 'ISO-8859-1')
res_str = translate(pat_str)
res = bytes(res_str, 'ISO-8859-1')
else:
res = translate(pat)
return re.compile(res).match I'm actually wondering why the pattern must be encoded in latin-1 and why we can't assume it's UTF-8 encoded (AFAIK, If this is the case, I'll file (yet again) another issue. So, here is what I suggest:
Plan 1 is the safest one but we could have people starting to wonder why we did not allow path-like objects (as we saw on typeshed). So I think plan 2 would be better (the fact that bytes patterns are accepted should be mentioned I think, this is not just an implementation detail IMO). Plan 3 would change the runtime with less undesirable effects than just accepting plain path-like objects but this would still be system-dependent since it relies on So my personal recommendation is plan 2. What do you think? |
Plan 2 SGTM. |
I have a PR and an issue ready for plan 2. @JelleZijlstra @barneygale I'm planning to go for plan 2; do you have any objection? |
No strong opinion; in general I prefer the plan that makes the fewest runtime behavior changes, since such changes could break people's code. |
Following the above discussion, I'm closing this specific issue as |
I vote for plan 1. Reading the discussion on GH-47437, it's not clear that the latin-1 encoding was a confident choice or a holdover from a time before |
Oh, I wasn't aware of that. I thought it was something that was decided. I can also live with plan 1. Since you're the pathlib expert, it's up to you! Plan 2 was essentially here to avoid surprises as on typeshed, but maybe with this issue, people would understand it? (technically, "filename string" should be sufficient to indicate a "str" so I'm also happy with plan 1). |
My opinion isn't strong, so please do whatever you think is best @picnixz. I'm no |
If you want to review #123345, I'd be happy. The idea is to avoid users having surprises and/or not knowing why something works on Windows but not on POSIX. |
Feature or enhancement
Proposal:
We should call
os.fspath
infnmatchcase
so that all functions infnmatch
have path-like support:See #123122 (comment) and #123122 (comment)
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs
fnmatch.fnmatchcase
#123216The text was updated successfully, but these errors were encountered: