Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLM-based image description to PptxConverter #218

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

keenranger
Copy link

This pull request introduces the use of a language model to generate descriptive alt text for images in the markitdown module. The key changes include the addition of a new method to generate descriptions using a language model and the integration of this method into the existing image processing workflow.

Integration of language model for image descriptions:

  • src/markitdown/_markitdown.py: Added logic to convert method to use a language model client and model for generating image descriptions if provided in kwargs.

New method for generating descriptions:

  • src/markitdown/_markitdown.py: Added _get_llm_description method to generate image descriptions using a language model. This method constructs a data URI for the image and sends a prompt to the language model client to generate the description.

Limitation

BE AWARE BEFORE CONVERTING BIG PPTX FILE
It may take a long time.

Suggestion

  • Add a parameter as kwargs, such as “use_llm,” inside the convert method to allow not only PPTX but also other converters to selectively use LLM for dispatching image descriptions.
  • Refactor the file handling logic in _get_llm_description into a separate function to make _get_llm_description reusable across all converters.
  • Add asynchronous logic for generating image descriptions.

@keenranger
Copy link
Author

@microsoft-github-policy-service agree company="Asleep Inc."

@l-lumin
Copy link
Contributor

l-lumin commented Dec 26, 2024

hi, could you add tests

@keenranger
Copy link
Author

@l-lumin Simple LLM test added using the current test.pptx file. I included one page with some basic images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants