-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-memory statistics calculation #209
Comments
Hi @MichalOleszak ! from PIL import Image
import os
from datasets import Dataset
from cleanvision import Imagelab
if __name__ == "__main__":
# loading images in-memory
files = os.listdir("./tests/data")
fpaths = [os.path.join("./tests/data", f) for f in files]
image_list = [Image.open(f) for f in fpaths]
# construct in-memory dataset
mydict = {"image": image_list}
dataset = Dataset.from_dict(mydict)
# call cleanvision on this dataset
imagelab = Imagelab(hf_dataset=dataset, image_key="image")
imagelab.find_issues()
imagelab.report()
print(imagelab.get_stats()) |
Hey @sanjanag, Thanks a lot for a quick reply! The solution you suggest works well, but from my quick&dirty experiments it seems to follow that for a single image (which is the use case I'm the most interested in) it's actually slower than dumping to a tempdir. I assume you are not planning to expose APIs in the form of |
Hi @MichalOleszak ! That sure looks like a good use case. We already have the code for computing these stats in bulk but not per image. But it should not be difficult to get those. You can find related code in image_property.py. If you take a look at the implemented ImageProperty classes, the calculate() method computes the raw value of the statistic and the get_scores() method converts it into a score between 0 and 1. |
See also: #210 |
Hello,
Do you support in-memory computation of statistics, or are you planning to add such a feature?
Details
I'm missing the possibility to obtain statistics like the ones returned by
imagelab.get_stats()
for an image that is not stored in a filesystem, but rather is kept in memory.Let's say I have a vision model deployed and it receives an image for inference via a REST API. The image is a numpy array or a PIL Image. I'd like to be able to obtain the statistics for it before passing it to the model for inference. A working solution I came up with is saving the image to a tempdir and calling cleanvision on it, but this unsurprisingly is very slow.
In case you are not planning on developing such a feature, could you please advise on a faster workaround than using tempdir? Thanks!
The text was updated successfully, but these errors were encountered: