You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An issue related to this one has already been submitted #1646
I've taken the liberty of opening a new issue more specifically linked to the format inference problem, for the sake of clarity. If you'd prefer me to continue with the above issue instead and close this one, let me know and I'll take care of it.
Issue
The file format is guessed through the file extension (if I am not mistaken, in this function). If this seems, at my level of knowledge of file storage, a good strategy to guess a local file format, it falls short for many use cases of remote (at least over http(s)) csv resources.
Indeed, many APIs do not have an explicit extension when offering csv files.
It would seem appropriate to use this information to infer the file format.
e.g. looking at the headers of the request of the above url (curl -v https://data.capatlantique.fr/api/explore/v2.1/catalog/datasets/244400610_subventions_liste/exports/csv) indeed shows :
content-type: text/csv; charset=utf-8
Some additional improvements could be made using the response headers, as we can see that the encoding is also mentionned, and we can find e.g. a more relevant filename in the Content-Disposition header :
Context
An issue related to this one has already been submitted #1646
I've taken the liberty of opening a new issue more specifically linked to the format inference problem, for the sake of clarity. If you'd prefer me to continue with the above issue instead and close this one, let me know and I'll take care of it.
Issue
The file format is guessed through the file extension (if I am not mistaken, in this function). If this seems, at my level of knowledge of file storage, a good strategy to guess a local file format, it falls short for many use cases of remote (at least over http(s)) csv resources.
Indeed, many APIs do not have an explicit extension when offering csv files.
Issue reproduction
With frictionless v4.40.11 :
(The problem remains in v5, tested with v5.16.1, but I could not find how to reproduce this output)
Workaround
As mentionned in this comment, the workaround is to explicitly provide the format.
Proposal
http(s) response formats are usually in the response's Content-Type header.
It would seem appropriate to use this information to infer the file format.
e.g. looking at the headers of the request of the above url (
curl -v https://data.capatlantique.fr/api/explore/v2.1/catalog/datasets/244400610_subventions_liste/exports/csv
) indeed shows :Some additional improvements could be made using the response headers, as we can see that the encoding is also mentionned, and we can find e.g. a more relevant filename in the Content-Disposition header :
The text was updated successfully, but these errors were encountered: