Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find common alignemnt with HTTP.download and Base.download #458

Open
oxinabox opened this issue Oct 14, 2019 · 4 comments
Open

Find common alignemnt with HTTP.download and Base.download #458

oxinabox opened this issue Oct 14, 2019 · 4 comments

Comments

@oxinabox
Copy link
Member

Right now
HTTP.download(url, local_path)
acts differently depending on the filesystem state with regard to local_path.

  • if it is is not provided, then it is saved in a temporary directory, with correct filename
  • if path to a directory is provided then it is saved into that directory, with correct filename
  • if path to a existing file is provided then it overwrite it.
  • otherwise the local path is uses as the full filepath (including filename) to save to.

Contrast this to Base.download(url, [local_path])
if local_path

  • if it is not provided, then it is saved in a temporary directory, with arbitary (nonclashing) filename.
  • if path to a existing directory is provided then it errors.
  • if path to a existing file is provided then it overwrite it.
  • otherwise the local path is uses as the full filepath (including filename) to save to.

So there are two differences from Base

  1. HTTP.download gets the filename correct based on the protocol.
  2. If path to existing directory passed HTTP downloads into it, where as Base.download just errors.

Point 1 is definately a good thing because filename has important metadata.

But point 2, maybe not:
I had though this was a nifty and intuitive feature.
But in JuliaLang/julia#33088
@StefanKarpinski made me rethink that.

@StefanKarpinski
Copy link
Contributor

It does seem like it would be good to bring these two APIs into agreement. Since the only contract of Base.download is that without an explicit local path, it will pick a temporary file name and use that, we can change that to preserve the download file name. I don't feel quite as strongly about the directory thing as I did when I wrote that, but it does give me pause. As long as it consistently returns the full path of the download I guess that's the most important thing.

@oxinabox oxinabox changed the title Make HTTP.download differing from Base with passing in existing directory. Find common alignemnt with HTTP.download and Base.download Oct 14, 2019
@oxinabox
Copy link
Member Author

Since the only contract of Base.download is that without an explicit local path, it will pick a temporary file name and use that, we can change that to preserve the download file name.

Right now there is the potentially problematic case where one gives HTTP.download
a directory,
that contains a file say "foo.txt.gz" which (by chance or otherwise)
is the name that the HTTP protocol says that the downloaded file should have.

I think the correct behavour is probably to do something like suffix the filename before the extension with _ + a number than increments until there is no colision.
So generally that would be foo_1.txt.gz.

This maintains the extension that FileIO needs.
Though a potential problem is that one would not be able to tell if this name avoidance was activated or not, so would not be able to fully recover the info.
Could potentially add configurable: On name colision options:
Error, Overwrite, Rename.

@quinnj
Copy link
Member

quinnj commented May 26, 2022

@oxinabox, is this issue still relevant? Anything actionable in HTTP.jl? It kind of seems like Downloads.jl has become it's own thing and HTTP.download has stayed its own thing, so 🤷 ?

@oxinabox
Copy link
Member Author

No update.
It does still remain true.

I still believe the conceptual mission of HTTP.download of "act like the user was interacting with a web browser's download command" is the correct one for Download.download to also have.
That choice decides a few options:

  1. Display lots of progress info
  2. If given directory save into it. If given any other path use it as the filename
  3. If given directory infer the filename based on Content-Disposition

I believe now that Base.Download using libcurl, point 3 and possibly 1 can be done (has 1 been done already?)
2 remains a decision though.

I guess we could move this issue to the Download std lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants