Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Being able to edit images #29

Open
MrCsabaToth opened this issue Aug 13, 2024 · 2 comments
Open

Being able to edit images #29

MrCsabaToth opened this issue Aug 13, 2024 · 2 comments
Labels
enhancement New feature or request multi modal Multi Modality related

Comments

@MrCsabaToth
Copy link
Member

Once we'll make the app able to generate images (#24) - so it can receive an image not just text, and it can display the result, - then we should be able to edit image as well. Maybe this feature won't even need any extra coding?

@MrCsabaToth MrCsabaToth added the enhancement New feature or request label Aug 13, 2024
@MrCsabaToth MrCsabaToth changed the title Beign able to edit images Being able to edit images Aug 17, 2024
@MrCsabaToth
Copy link
Member Author

Even though we could see in some demos potential image or audio outputs, currently the Gemini multi-modality are input-only: https://www.linkedin.com/posts/netskink_how-to-make-an-audio-podcast-as-demonstrated-activity-7230943255578697729-hGBI

I've seen other assistant project which also used STT and TTS like me.
We can consider Imagen3 for image related generations or edits, but that would be a separate interaction mode, since the prompt would need to be passed to Imagen3 and not Gemini.
Similarly, for music or audio generation we'd need a dedicated interaction, maybe it could be an extension of the Shazam mode? #38

@MrCsabaToth MrCsabaToth added the multi modal Multi Modality related label Aug 21, 2024
@MrCsabaToth
Copy link
Member Author

Gemini Advanced itself is relying on Imagen3. This is the way https://www.theverge.com/2024/8/28/24230445/google-gemini-create-ai-generated-people-imagen-3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request multi modal Multi Modality related
Projects
None yet
Development

No branches or pull requests

1 participant