-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added progress callback when save_all is used #7435
base: main
Are you sure you want to change the base?
Conversation
Tests/test_file_apng.py
Outdated
|
||
assert progress == [ | ||
("Tests/images/apng/single_frame.png", 1, 7), | ||
("Tests/images/apng/single_frame.png", 2, 7), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line could use a comment that single_frame.png
gets appended to itself in the save()
call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
come to think of it, having explicit index there would make adding a comment somewhat redundant
You no longer think this is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You no longer think this is necessary?
The comment was meant to clarify that the image is repeated because it occurs twice in the input; providing explicit indices makes pointing that out unnecessary.
Tests/test_file_apng.py
Outdated
progress.append((filename, frame_number, n_frames)) | ||
|
||
Image.new("RGB", (1, 1)).save(out, "PNG", save_all=True, progress=callback) | ||
assert progress == [(None, 1, 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe indicate index here (instead of None
)? 1
for the 1st input image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be open to the idea of having an index all of the time? Or, we could pass back the image instance, and let the user figure out the index or filename as they see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having index always is probably simpler. As for passing the image, it wouldn't be much help with identifying the index if an image is reused in the input (like in that one test… come to think of it, having explicit index there would make adding a comment somewhat redundant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit to switch to an index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...Why did you remove the filename though? I thought you were talking about adding an index, not removing the filename (as my own suggestion was to provide index when the filename isn't available)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that
(index, filename, frame_number, frame_total)
crosses the line from 'neat and tidy' to 'passing back everything to the user just in case something is useful'.
You can use a namedtuple
instead (i.e. define Process = namedtuple('Process', "index filename frame n_frames")
somewhere global, and pass its value to the progress callback).
And I'd say that the name of the input file being currently processed is a quite appropriate piece of information to be included, as it's the most user-friendly way of showing, well, which file is being processed (whether you're logging your progress to a terminal or displaying it in GUI on top of the progressbar).
Why make the most likely usecase inconvenient? The name of the game shouldn't be "provide as little information as possible and force the calling code to figure out the rest of it every single time", it should aim at the most likely general purpose case (which I imagine more often than not would be logging/displaying something in the lines of Converting: input/image2.png [#2] -> output.png (frame 12 of 84)
; and the only thing here that would be instantly available for the calling code without complication in any situation would be the output file name, as it's passed in as a raw string and never changes throughout the entire process).
I would have thought it was simple, given that the user just provided these images as the base image and
append_images
?
Suppose the code that wants to log progress isn't concerned with input at all – it only has a generator that supplies these images; why would it need to implement workarounds like instantiating all values from that generator at once, or wrapping it into a generator object that mucks around with the data being yielded, when all it wants to do is display progress on screen? And like I said, filename is the clearest way possible of identifying an image that was loaded from a file.
append_images
are arbitraryPIL.Image.Image
I'm pretty sure these can be obtained with Image.open()
, just like the first image.
why not pass just saved im object itself?
I suggested that, but it was pointed out that the image might be provided as an input more than once.
…which is not the case with filenames, I suppose :-)
The index can be necessary in case of duplicate files in the input, but the filename is likely to be necessary for the purpose of displaying the progress in user-friendly manner. (Now passing the entire image object, that would actually be "passing back everything to the user just in case something is useful" 😆)
I don't like passing indices since it's hard to figure out the difference between index and frame_number and it's very simple to write the code which uses index incorrectly.
Incidentally, passing progress status as a namedtuple
(or the like) would help with that.
I'm not opposed to dropping
index
altogether - I think the key benefit of this feature is being able to track how many frames have been completed out of the total number.
Filename and/or index are necessary to show which image this current frame comes from, in case when more than one is on the input – it shouldn't be a stretch to expect it to be necessary information for the vast majority of usecases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced that a filename or image index is necessary, but I've pushed a commit to restore the filename.
>>> from collections import namedtuple
>>> State = namedtuple("State", "a b")
>>> State(2, 4)[0]
2
Because namedtuples still have order, and I'm concerned that there could be different views about what the best order is, or that we might want to add more items in the future, I've used a dictionary instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keys are a bit overly verbose, but that should do the job at least.
Still, I believe having a single fixed class with predefined attributes would make for better API than using free-form dict
s in every implementation (not to mention avoiding potential issues if there's a need to make changes at some point in the future). If you're concerned with possibility of using indices to access a namedtuple
field (which I honestly find rather unlikely as attribute access is much clearer), you can use a @dataclass
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, it would be a good idea to have this getattr(im, "filename", None)
thing in one place in the codebase (and if you keep the dict
, it would still be a good idea to have one piece of code defining its creation, at least for sake of ensuring there won't be any discrepancies in key names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit to move the creation of the dictionary into a common method.
Tests/test_file_webp.py
Outdated
expected = [] | ||
for i in range(42): | ||
expected.append(("Tests/images/iss634.webp", i + 1, 43)) | ||
expected.append((None, 43, 43)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, saying that the last frame comes from image #2 would be more informative than "don't know" – especially when there's multiple images involved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit to switch to an index.
An idea: use the same callback for generating frames: pass current |
Sorry, I'm not sure I understand. Are you suggesting this from PIL import Image
colors_left = ["#f00", "#0f0"]
def callback():
if colors_left:
return Image.new("RGB", (1, 1), frames_left.pop(0))
im = Image.new("RGB", (1, 1))
im.save("out.png", save_all=True, progress=callback) with the intention that it allow multiframe images to be saved without needing all images to be created beforehand and so reduce memory impact? |
In general, yes. Depending on current code flow, it could be like this, or require some informations about the frames (like size and mode). |
APNG considers the modes of all frames when saving - #6610. I don't know what an elegant way of getting that particular information from the user at the beginning would look like, and I find it hard to imagine that users would expect that they have to know that at the beginning. |
Tests/test_file_apng.py
Outdated
assert progress == [ | ||
(0, 1, 7), | ||
(1, 2, 7), | ||
(2, 3, 7), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say... if the frames are counted from 1
, why count the images from 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm triggering these callbacks when frames are completed, not when they are started.
My intention is for (0, 1, 7)
to communicate "We're in the first image. We've completed the first frame, there are seven frames in total".
Then (2, 7, 7)
communicates "We're in the third image. We've completed seven frames, there are seven frames in total, so we're done"
If I started counting the frames from 0, the last value would be (2, 6, 7)
. That looks less to me like save_all is finished.
I recognise this could be a subject of debate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…The question was “Why count the images from 0
?”
I.e. why are the images numbered starting from 0
? You're numbering frames starting from 1
– it would be consistent to use the same numbering scheme for images.
(And yes, I know that due to how pointer arithmetic works, arrays in C are indexed starting from 0
, but that's an implementation detail and it's kind of ridiculous to drag it into higher-level languages just because; counting things is meant to be started from 1
– that's what the number means in the first place.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to explain that my goal is not to number the frames from 1, but rather to clearly communicate when progress is completed, because the frame count will be compared with the total number of frames. The image count will not.
arrays in C are indexed starting from 0, but that's an implementation detail and it's kind of ridiculous to drag it into higher-level languages
Python arrays are also indexed from 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image count will not.
And yet counting from 0
would be inconsistent with the other counting used within same data set.
Python arrays are also indexed from 0.
Which makes no sense since Python is a high-level language (and there's some consistency issues caused by this arbitrary choice to imitate C array indexing, but that's neither here nor there).
Besides, the images aren't coming from a list as far as the user is concerned (well, there's append_images
, but going by that logic you'd have to start from -1
since the first input image is not in this list in the first place… except that -1
is a valid index in Python, so it probably would be None
instead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit naming each piece of information within the returned data. It is now "image_index" and "completed_frames". Hopefully that clarifies that one is the Pythonic zero-based index of the image in progress, and the other is the number of frames that have been finished.
I imagine if this is ever implemented, it should not be exposed in a workaround-ish way, and it probably should be implemented as a format-specific method if it can only be supported for specific formats (e.g. I'm also pretty sure such a feature is out of scope for this particular pull-request. |
It out of scope of your needs, but is closely related to current PR since it may or may not share the same API and it is better to discuss it here. |
We can restrict modes of generated frames only to the same as the initial image. |
You're conflating the concerns of "keeping track of progress" and "generating input data for the operation", based solely on the fact that both are related to "iteration over frames". I don't believe such features should even intersect in their implementation, let alone complicating each other. In fact, I'm pretty sure that the correct way of implementing the feature you're requesting would be by supporting a generator as a value passed in def to_gif_memheavy (frames, filename):
first, _frames = Image.open(frames[0]), frames[1:]
# append_images is a list with pre-loaded data
first.save(filename, 'GIF', save_all=True, append_images=[Image.open(s) for s in _frames])
def to_gif_memlow (frames, filename):
first, _frames = Image.open(frames[0]), frames[1:]
# append_images is a generator and loads images one-at-a-time
first.save(filename, 'GIF', save_all=True, append_images=(Image.open(s) for s in _frames)) |
Having iterators support you can track progress without any callbacks. |
Only in the sole specific case when composing an animated image from separate single-frame images that you load one-by-one. Which means, when combining animated images, you'd only get updates per image, not per frame (and the simple usecase of "saving/converting a single animated image" would get zero progress tracking support). Not to mention you're conflating concerns of "providing input" and "progress tracking" again. Why would the user need to be forced to deal with both when he only needs one of these things? And why would the user code dealing with one of these need to also deal with the other? If the user wants to track frame progress, he passes a callback that gets invoked on every frame (e.g. |
@@ -2446,6 +2446,23 @@ def save(self, fp, format=None, **params): | |||
if open_fp: | |||
fp.close() | |||
|
|||
def _save_all_progress( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's with "save all"? This method just passes data into another function (supplied externally).
If anything, _report_progress
would make much more sense here. (Or even just _progress
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It indicates that the method is used in relation to im.save(out, save_all=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…Ah. Alright, it does make sense in that context.
Though it would still make more sense to name it something like _frame_progress
, I'd say; it's more generic and there's not really anything in the method itself that makes it necessarily about saving only.
Resolves #7433
The issue requests the ability to use a callback function to monitor progress when calling
save_all
. This function is triggered after processing each frame, with the filename of the current image (orNone
if the filename is not available), the number of frames processed so far, and the total number of frames.gives