Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a 'tags' endpoint to support runtime tagging #185

Closed
wants to merge 3 commits into from

Conversation

romain-intel
Copy link
Contributor

This needs #184 to function properly.

@romain-intel romain-intel requested a review from ferras May 28, 2021 19:58
Comment on lines 113 to 119
if o['operation'] == 'add':
# This is the only error we fail hard on; adding a tag that is
# in system tag
if o['tag'] in obj['system_tags']:
return DBResponse(response_code=405, body=json.dumps(
{"message": "tag %s is already a system tag and can't be added to %s"
% (o['tag'], o['id'])}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a necessary guard? the metaflow cli does not guard against adding system tags as user tags with --tag it seems. Should similar checks be done during creation as well then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an excellent point and a bug most likely. This patch was done to allow modification of tags after the fact and the idea is to make sure that system tags stay unique and not affected. They are kept separate but we were trying to make sure that system tags were not overwritten in a way although I will admit that this is maybe not very useful. Let me circle back on that.

Comment on lines 132 to 133
result = await table.update_row(filter_dict=obj_filter, update_dict={
'tags': "'%s'" % json.dumps(obj['tags'])})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth checking: Previously update_row was only used for heartbeat updating, but using it this way might not be reliable/secure, as tags contain user inputs. Preferred way for psycopg is to use the cur.execute(sql_template, values) which should take care of correctly escaping the values.

Might be worth refactoring the update_row function to use the safe execute syntax.
see https://www.psycopg.org/docs/usage.html#passing-parameters-to-sql-queries for usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will update the update_row function to use a template. That seems the better way.



class TagApi(object):
lock = asyncio.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied from other APIs but let me check if there was a purpose behind it.

services/metadata_service/api/tag.py Show resolved Hide resolved
elif o['operation'] == 'remove':
if o['tag'] in obj['tags']:
modified = True
obj['tags'].remove(o['tag'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that the Metaflow client and service do not guard against duplicate tags, so --tag test --tag test would add the test tag twice, due to tags being a simple array. remove will only get rid of one occurrence of a tag. Should this be the way the update works?

alternatively get rid of all occurences of a tag:

obj['tags'] = [tag for tag in obj['tags] if tag != o['tag']]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right again on the creation side of things; the update will guard against duplicates but not the add (I will double check on that side; I think we should just convert it to a set and set that and also check against system tags).

Comment on lines +83 to +84
for o in body:
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multiple updates are not done inside a transaction, so there is a slight peculiarity with the way things might work in case of errors. For example

  1. flow/run/step tag updates succeed
  2. for whatever reason task tag update fails
  3. API responds with an error, even though some records were updated, and some updates were never reached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was a replica of how things were happening inside (in our internal version) and was a conscious choice but maybe something worth reconsidering. @ferras may have some more ideas on that.

@saikonen saikonen closed this Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants