-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata file fetching via the API #2741
Conversation
I think this is also needed for galaxyproject/bioblend#192 . |
@nsoranzo Yep, it is. His email to the mailing list is what made me look into this. |
Happy to take suggestions for renaming the endpoint, this is just a first stab. |
Well, since urls are forever (or something): It'd be best to just call it Asides (+/-0):
|
@carlfeberhard Good catch on valid chars -- I saw the same when I was working on this and meant to go back and refactor it. Will do. |
@@ -192,7 +191,7 @@ def download_from_genomespace_importer( username, token, json_parameter_file, ge | |||
# if using tmp file, move the file to the new file path dir to get scooped up later | |||
if using_temp_file: | |||
original_filename = filename | |||
filename = ''.join( c in VALID_CHARS and c or '-' for c in filename ) | |||
filename = ''.join( c in FILENAME_VALID_CHARS and c or '-' for c in filename ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes space as a valid character right? Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm. The valid chars set the genomespace tool uses is indeed different. Should this be the same set of chars, or not? And, if so, which chars?
(ping @blankenberg maybe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also adds ,^_
as valid characters (good catch, 🐦 👀 @jmchilton!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it's definitely a slightly different set of characters that this particular tool was using, yes. The question is, what's the set we actually want for exported user-downloaded files? Either way, we should pick a single set and go with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is:
- For files going into Galaxy - it does seem that Dan explicitly wanted to allow spaces here and history items can include names with spaces (and most of our flagship tools generate items with spaces) so I don't know why would exclude them here. I didn't make that decision, but it seems reasonable.
- For downloads coming from Galaxy - I'm guessing we exclude spaces because they make the files easier to work with on the command-line. I didn't make that decision, but it seems reasonable.
So I don't think we need to be consistent about whitespace handling across these two different use cases. Can you explain more why you think they should be consistent - and if so do you want spaces in downloads or do you want to exclude spaces when importing from genomespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's a bigger problem than spaces, now that I think about it more. Right now we also rip out non-ascii characters like ä on egress that we don't on ingress, etc.
So, forgetting genomespace for a second, I can upload Müßiggänger.txt
, which gets entered into the history exactly like that.
But when I download it, it's Galaxy140-[M__igg_nger.txt].txt
, which is unfortunate and I think unreasonable.
That said, this was a random refactoring enhancement that got looped into this PR and I'm happy to rip out those particular Genomespace changes to move this forward if we'd all rather revisit it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here or in a new PR I'd be very happy to see any unicode alpha-numeric character added. I'm a little more +/- 0 on white listing shell relevant characters.
When I was investigating shell-safe characters for CWL stuff - I came across this library (https://pypi.python.org/pypi/regex) - which unicode-friendly extended character classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'm just going to revert the genomespace changes for now since I really want to hear from @blankenberg on that and I don't want to hold this PR up any more.
Will follow up on extending the valid character set in a separate endeavor.
Cool beans @dannon - thanks! |
This will address #2560