Add LLM-based image description to PptxConverter #218
+58
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces the use of a language model to generate descriptive alt text for images in the
markitdown
module. The key changes include the addition of a new method to generate descriptions using a language model and the integration of this method into the existing image processing workflow.Integration of language model for image descriptions:
src/markitdown/_markitdown.py
: Added logic toconvert
method to use a language model client and model for generating image descriptions if provided inkwargs
.New method for generating descriptions:
src/markitdown/_markitdown.py
: Added_get_llm_description
method to generate image descriptions using a language model. This method constructs a data URI for the image and sends a prompt to the language model client to generate the description.Limitation
BE AWARE BEFORE CONVERTING BIG PPTX FILE
It may take a long time.
Suggestion
_get_llm_description
into a separate function to make_get_llm_description
reusable across all converters.