Add LLM-based image description to PptxConverter #218

keenranger · 2024-12-26T08:37:28Z

This pull request introduces the use of a language model to generate descriptive alt text for images in the markitdown module. The key changes include the addition of a new method to generate descriptions using a language model and the integration of this method into the existing image processing workflow.

Integration of language model for image descriptions:

src/markitdown/_markitdown.py: Added logic to convert method to use a language model client and model for generating image descriptions if provided in kwargs.

New method for generating descriptions:

src/markitdown/_markitdown.py: Added _get_llm_description method to generate image descriptions using a language model. This method constructs a data URI for the image and sends a prompt to the language model client to generate the description.

Limitation

BE AWARE BEFORE CONVERTING BIG PPTX FILE
It may take a long time.

Suggestion

Add a parameter as kwargs, such as “use_llm,” inside the convert method to allow not only PPTX but also other converters to selectively use LLM for dispatching image descriptions.
Refactor the file handling logic in _get_llm_description into a separate function to make _get_llm_description reusable across all converters.
Add asynchronous logic for generating image descriptions.

Signed-off-by: Hankyeol Kyung <[email protected]>

keenranger · 2024-12-26T08:38:31Z

@microsoft-github-policy-service agree company="Asleep Inc."

l-lumin · 2024-12-26T10:16:19Z

hi, could you add tests

src/markitdown/_markitdown.py

…tent type Signed-off-by: Hankyeol Kyung <[email protected]>

Signed-off-by: Hankyeol Kyung <[email protected]>

keenranger · 2024-12-27T07:12:11Z

@l-lumin Simple LLM test added using the current test.pptx file. I included one page with some basic images.

Add LLM-based image description to PptxConverter

7fe3207

Signed-off-by: Hankyeol Kyung <[email protected]>

Viddesh1 reviewed Dec 26, 2024

View reviewed changes

src/markitdown/_markitdown.py Outdated Show resolved Hide resolved

keenranger added 2 commits December 27, 2024 15:28

Update LLM description method to accept image object and validate con…

9449d5b

…tent type Signed-off-by: Hankyeol Kyung <[email protected]>

Test for PPTX conversion with OpenAI client

06ccea5

Signed-off-by: Hankyeol Kyung <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLM-based image description to PptxConverter #218

Add LLM-based image description to PptxConverter #218

keenranger commented Dec 26, 2024

keenranger commented Dec 26, 2024

l-lumin commented Dec 26, 2024

keenranger commented Dec 27, 2024

Add LLM-based image description to PptxConverter #218

Are you sure you want to change the base?

Add LLM-based image description to PptxConverter #218

Conversation

keenranger commented Dec 26, 2024

Limitation

Suggestion

keenranger commented Dec 26, 2024

l-lumin commented Dec 26, 2024

keenranger commented Dec 27, 2024