-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework move logic #189
base: master
Are you sure you want to change the base?
Rework move logic #189
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR! Please have a look at the comments.
afew/MailMover.py
Outdated
# Open DB R/W if are not in dry-run | ||
if not self.dry_run: | ||
self.db = notmuch.Database(self.db_path, | ||
mode=notmuch.Database.MODE.READ_WRITE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set database mode inside __init__
, as we already know there if we want run in dry-run or not?
This would avoid opening the database a second time here…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will move this to the __init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So about this point, I have tested several version, but if I open the database READ_WRITE
,
in the __init__
the call hang after 2 requests, and get stuck, I have to kill manually.
$ afew --move-mails -vvvvvvvv --all
INFO:root:checking mails in 'Perso/INBOX'
DEBUG:root:query: folder:Perso/INBOX AND tag:spam AND 576540000..1522686227
DEBUG:root:query: folder:Perso/INBOX AND tag:sent AND 576540000..1522686227
So maybe it comes with my own version of notmuch / Xappian but unless I find a way to make it work, I think it is better to let the DB READ_ONLY all the time and just switch to READ_WRITE when needed (like I finally did), but it would nicer to have it setup only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hanging while trying to open the database r/w a second time looks like the previous connection wasn't closed (since notmuch 0.23):
When opening a database notmuch by default will wait for another process to release a write lock, rather than returning an error.
afew/MailMover.py
Outdated
except shutil.SameFileError: | ||
logging.warn("trying to move '{}' onto itself".format(fname)) | ||
continue | ||
logging.info("Copy original mail: "+fname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use .format
instead of concatenating strings: https://pyformat.info/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I will upgrade those
afew/MailMover.py
Outdated
logging.info("Copy to: "+destination) | ||
dbpath = self.db.get_path() | ||
logging.info("Path of DB: "+dbpath) | ||
relpath=os.path.relpath(os.path.dirname(destination), dbpath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo - should be realpath
and os.path.realpath
.
Is realpath calculation even necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the beginning I was trying to use a relative path as explained in the doc, but after some time stuck with that I finally test and successfully made it work with full path.
So no this is not more necessary I will remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, there's a os.path.relpath
as well - didn't know that.
But if the code now works with both relative and absolute paths, we shouldn't do this here, right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I discovered relpath
aswell really useful to generate relative path.
I agree we can get rid of that now we only use absolute path.
afew/MailMover.py
Outdated
shutil.copy2(fname, self.get_new_name( | ||
fname, | ||
destination)) | ||
dest_file_rel = dbpath+'/'+relpath+'/'+filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use os.path.join
, or even better pathlib to join paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will upgrade that
afew/MailMover.py
Outdated
# 3 REMOVE ORIGINAL | ||
logging.info("Delete the original mail in DB") | ||
# 4 REMOVE ORIGINAL FROM DB | ||
notmuch.Database.remove_message(self.db, fname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I do favor some logging, this whole block seems to be wayy to extensive.
Most of the statements are merely printing out variable values. Maybe one or two lines with loglevel debug (something like {} -> {}
.format(oldpath, newpath)`) should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree but I will rework them to be more precise
# this is ugly, but shutil does not provide more | ||
# finely individuated errors | ||
if str(e).endswith("already exists"): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's with this line? Do we already check somewhere else if source and destination are the same, which is why we skip that check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed that because this will be logged by shutil
log and I think if you try to copy over the same thing it should be an error from shutil
and/or your FS but I will add a test with a similar message for that case.
@@ -59,7 +62,7 @@ def move(self, maildir, rules): | |||
messages = notmuch.Query(self.db, main_query).search_messages() | |||
for message in messages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add below per-message code into a self-contained staticmethod? This would greatly improve readability and testability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure to fully understand, but if you talk about export the whole per-message process I agree, I think the whole class need some refactor, for example the move() method is also searching the messages which is problematic to add tests and keep simple.
So I have project to refactor the class to be more simple, and support some unit tests, but as the mail mover is a big feature I think it would be safer to do that after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, code in the loop body should go inside it's own staticmethod, for better testability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I agree the thing is I feel it would be simpler to use the function move()
to do the actual moving operation, and create an other function like getMessages()
to do the notmuch query and pass the message list to the move(), that way it could be easy tested by using mock data.
cf37c01
to
ac69283
Compare
Hey @flokli I have upgraded the branch I have implemented all your feedback maybe beside the open connection ot the database in the |
ac69283
to
d3d62af
Compare
afew/MailMover.py
Outdated
str( | ||
uuid.uuid1()) | ||
+ ':' | ||
+ os.path.basename(fname).split(':')[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use PyFormat here as well?
# construct a new filename, composed of a made-up ID
# and the flags part of the original filename
filename = '{}:{}'.format(uuid.uuid1(),
os.path.basename(fname).split(':')[-1])
return os.path.join(destination, filename)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure, I just upgraded it.
Already looks much cleaner, thanks! With that being done, I can have another look in why we need to open the database again for R/W. |
@flokli I am squashing the non moveMail() related commit. Not really sure about the "you can use %-style format strings, and pass variables to the logging.* method directly, " but I have cleanup a bit of messages in the log. |
d3d62af
to
e05a918
Compare
e05a918
to
33b5568
Compare
def moveMail(self, mailFrom, mailTo, upgradeDatabase): | ||
output = False | ||
""" We need to check that we are not moving a file to itself """ | ||
if mailFrom == mailTo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mailFrom
== mailTo
should not be an error. But mailTo
is a dir, while mailFrom
is a file. So: can this happen? If not, it should not be considered, either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sure I will rebase the branch and change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
after rebasing the branch, and re-reading I aggree we need to be sure that:
- mailFrom is a file AND mailTo is a directory
If this is not I will raise an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see those:
try:
" mailFrom must be a file "
if not os.path.isfile(mailFrom):
logging.error("mailFrom not a file: {}".format(mailFrom))
raise SystemExit
" mailTo must be a directory "
if not os.path.isdir(mailTo):
logging.error("mailTo not a dir: {}".format(mailTo))
raise SystemExit
So those tests already exist (sorry took me a second reading to be sure),
so duplicating those is not a good idea, and moving them end in the same state.
So tell me @rpuntaie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the commits I referred to have gone. There is no mailFrom
anywhere anymore. So this seems to be obsolete.
As commented further below: tests are absolutely needed, to avoid breaking established behavior. E.g. I've seen {}/{}/cur/
is now {}/{}
, which is an interface change: The user would have to change its config.
def move(self, maildir, rules): | ||
''' | ||
Move mails in folder maildir according to the given rules. | ||
''' | ||
# identify and move messages | ||
logging.info("checking mails in '{}'".format(maildir)) | ||
to_delete_fnames = [] | ||
moved = False | ||
for query in rules.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you assure that the mailbox dirs do exist. I would like to only change the config, without having to additionally create a new directory manually:
for ncs in 'tmp new cur'.split():
destination = os.path.join(self.db_path,rules[query],ncs)
os.makedirs(destination, mode=0o700, exist_ok=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that point I am not sure that behavior should be default,
to my pov, the mailmover should move from existing mailbox to an other (existing).
I am not sure, this case may append often, but if there is a real use case behind this,
we could easily add this check, feel free to add more details @rpuntaie maybe I am missing something.
Fix multi import Cleanup flake8 error Refactor MailMover.move() logic Fix import order Cleanup class linter error
33b5568
to
25606aa
Compare
Ok I have rebased the branch on the actual master, Send reviews ! |
We really need some sort of test suite in place ensuring behaviour doesn't break before merging this in, sorry!
|
Hey @flokli , I will work on that and add a test around the MailMover, this issue is real and has to fixed. |
Following the #188 issue, we needed to change move() function,
to be able to move, and upgrade the database a bit more lightly: