-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize to_history_dataset_association in create_datasets_from_library_folder #18970
Merged
Merged
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
ebc5dc9
Optimize to_history_dataset_association in __create_datasets_from_lib…
arash77 46cbd11
Using `stage_addition` and `add_pending_items` functions instead of `…
arash77 5e924cc
serialize after commit in create datasets from library folder
arash77 4bf029b
Rename flush in `to_history_dataset_association` to commit
arash77 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you know your committing in there, so no need to do again on the next line.
Can you check where this method is used and see if not adding the dataset to a history and the flush=True ever make sense ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flush argument for the
add_datasets
function is always set toFalse
in the current context, which means it won't trigger a commit. Right?Also, I noticed that in other places, the datasets are committed after using this function, for example, in this part of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, I keep forgetting that. It's fine then. The second part still applies, can you check in which cases we do in fact want to commit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get what should look for exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing is different from committing:
session.commit()
will write to the database AND commit the current transaction, whereassession.flush()
will write to the database transaction buffer without committing. What that means is that if we flush after adding or modifying a dataset, and on a subsequent commit there's any kind of db error that causes a rollback, whatever we flushed will be rolled back. If we commit, then the rollback on a subsequent commit won't affect the changes we have committed previously.We have
flush
parameters in our code for historical reasons: pre-2.0, SQLAlchemy would commit on flush (with ourautocommit=True
setting), which is why the terms were used interchangeably. Now it's no longer the case; so my suggestion would be to use argument names consistent with what we do in the function (i.e., flush vs commit).And with that said, I am not at all sure we should commit in that method :) The code @arash77 referenced replaced the pre-2.0
session.flush
: we did that to ensure there was no behavioral change (a commit stayed a commit). Otherwise with hundreds ofsession.flush
lines no longer committing, we'd never know what broke when moving to 2.0. HOWEVER, now that we have upgraded, we can take a hard look at all those commits and consider if we really need them, and, where possible, replace commits with flushes (unless we're optimizing like here, in which case we still can consider whether it's a commit that we need and not a flush).Also, the code you referenced uses the
transaction
context manager to ensure that if there is no active transaction, we begin one before trying to commit. If there is a precedingsession.add
statement, there's guaranteed to be an active transaction.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! Just to clarify, for this case, should I change
flush
tocommit
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you mean the name of the argument, if you intend to commit then yes, please. If you think flushing might be sufficient (so no
session.commit()
, then keep it as flush. In other words, as long as the argument is consistent with the action.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I intend to commit here.
Thank you, then I will just rename the argument.