Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata file fetching via the API #2741

Merged
merged 5 commits into from
Aug 11, 2016

Conversation

dannon
Copy link
Member

@dannon dannon commented Aug 5, 2016

This will address #2560

@nsoranzo
Copy link
Member

nsoranzo commented Aug 5, 2016

I think this is also needed for galaxyproject/bioblend#192 .

@dannon
Copy link
Member Author

dannon commented Aug 5, 2016

@nsoranzo Yep, it is. His email to the mailing list is what made me look into this.

@dannon
Copy link
Member Author

dannon commented Aug 5, 2016

Happy to take suggestions for renaming the endpoint, this is just a first stab.

@carlfeberhard
Copy link
Contributor

Well, since urls are forever (or something): It'd be best to just call it metadata_files. The get is already there.

Asides (+/-0):

  • you probably want to use web._future_expose... so you can just let the exception bubble up. The decorator will convert the exception to JSON and you won't have to return a string.
  • we've also used that valid chars thing in quite a few places now. /shrugs Might be out-of-scope.

@dannon
Copy link
Member Author

dannon commented Aug 5, 2016

@carlfeberhard Good catch on valid chars -- I saw the same when I was working on this and meant to go back and refactor it. Will do.

@@ -192,7 +191,7 @@ def download_from_genomespace_importer( username, token, json_parameter_file, ge
# if using tmp file, move the file to the new file path dir to get scooped up later
if using_temp_file:
original_filename = filename
filename = ''.join( c in VALID_CHARS and c or '-' for c in filename )
filename = ''.join( c in FILENAME_VALID_CHARS and c or '-' for c in filename )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes space as a valid character right? Is that intentional?

Copy link
Member Author

@dannon dannon Aug 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm. The valid chars set the genomespace tool uses is indeed different. Should this be the same set of chars, or not? And, if so, which chars?

(ping @blankenberg maybe)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also adds ,^_ as valid characters (good catch, 🐦 👀 @jmchilton!)

Copy link
Member Author

@dannon dannon Aug 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it's definitely a slightly different set of characters that this particular tool was using, yes. The question is, what's the set we actually want for exported user-downloaded files? Either way, we should pick a single set and go with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is:

  • For files going into Galaxy - it does seem that Dan explicitly wanted to allow spaces here and history items can include names with spaces (and most of our flagship tools generate items with spaces) so I don't know why would exclude them here. I didn't make that decision, but it seems reasonable.
  • For downloads coming from Galaxy - I'm guessing we exclude spaces because they make the files easier to work with on the command-line. I didn't make that decision, but it seems reasonable.

So I don't think we need to be consistent about whitespace handling across these two different use cases. Can you explain more why you think they should be consistent - and if so do you want spaces in downloads or do you want to exclude spaces when importing from genomespace?

Copy link
Member Author

@dannon dannon Aug 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's a bigger problem than spaces, now that I think about it more. Right now we also rip out non-ascii characters like ä on egress that we don't on ingress, etc.

So, forgetting genomespace for a second, I can upload Müßiggänger.txt, which gets entered into the history exactly like that.

But when I download it, it's Galaxy140-[M__igg_nger.txt].txt, which is unfortunate and I think unreasonable.

That said, this was a random refactoring enhancement that got looped into this PR and I'm happy to rip out those particular Genomespace changes to move this forward if we'd all rather revisit it separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here or in a new PR I'd be very happy to see any unicode alpha-numeric character added. I'm a little more +/- 0 on white listing shell relevant characters.

When I was investigating shell-safe characters for CWL stuff - I came across this library (https://pypi.python.org/pypi/regex) - which unicode-friendly extended character classes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'm just going to revert the genomespace changes for now since I really want to hear from @blankenberg on that and I don't want to hold this PR up any more.

Will follow up on extending the valid character set in a separate endeavor.

@jmchilton jmchilton merged commit b40e993 into galaxyproject:dev Aug 11, 2016
@jmchilton
Copy link
Member

Cool beans @dannon - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants