Being able to edit images #29

MrCsabaToth · 2024-08-13T23:49:43Z

Once we'll make the app able to generate images (#24) - so it can receive an image not just text, and it can display the result, - then we should be able to edit image as well. Maybe this feature won't even need any extra coding?

MrCsabaToth · 2024-08-21T22:09:22Z

Even though we could see in some demos potential image or audio outputs, currently the Gemini multi-modality are input-only: https://www.linkedin.com/posts/netskink_how-to-make-an-audio-podcast-as-demonstrated-activity-7230943255578697729-hGBI

I've seen other assistant project which also used STT and TTS like me.
We can consider Imagen3 for image related generations or edits, but that would be a separate interaction mode, since the prompt would need to be passed to Imagen3 and not Gemini.
Similarly, for music or audio generation we'd need a dedicated interaction, maybe it could be an extension of the Shazam mode? #38

MrCsabaToth · 2024-08-28T17:50:37Z

Gemini Advanced itself is relying on Imagen3. This is the way https://www.theverge.com/2024/8/28/24230445/google-gemini-create-ai-generated-people-imagen-3

MrCsabaToth added the enhancement New feature or request label Aug 13, 2024

MrCsabaToth changed the title ~~Beign able to edit images~~ Being able to edit images Aug 17, 2024

MrCsabaToth added the multi modal Multi Modality related label Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Being able to edit images #29

Being able to edit images #29

MrCsabaToth commented Aug 13, 2024

MrCsabaToth commented Aug 21, 2024

MrCsabaToth commented Aug 28, 2024

Being able to edit images #29

Being able to edit images #29

Comments

MrCsabaToth commented Aug 13, 2024

MrCsabaToth commented Aug 21, 2024

MrCsabaToth commented Aug 28, 2024