Image support inside complex types #1767

isaacbmiller · 2024-11-06T17:03:48Z

Currently, only you can only pass a single image at a time in a signature.

E.g. this will work

class ImageSignature(dspy.Signature):
    image1: dspy.Image = dspy.InputField()
    image2: dspy.Image = dspy.InputField()

But any more complex types involving images wont:

class ImageSignature(dspy.Signature):
    images: List[dspy.Image] = dspy.InputField()

class ImageSignature(dspy.Signature):
    labeled_images: Dict[str, dspy.Image] = dspy.InputField()

This is due to how images are compiled into OAI compatible messages, where inside chat_adapter.py we create a large list of content blocks by giving fields with an image_url special privileges:

{
    "content": [{
         "type": "text",
         "text": "...",
    },
    {
         "type": "image_url"
         "image_url": {"url": "..."} # url is either an actual url or the base64 data
    }]
}

I do some fairly naive parsing inside ChatAdapter, and there is definitely a more elegant solution here.
#1763 addresses the List case, but I want a more generalized solution.

cc @okhat

The text was updated successfully, but these errors were encountered:

thomasahle · 2024-11-06T17:43:08Z

This is how I did it in fewshot:

def format_input_simple(pydantic_object: BaseModel, img_formatter=None) -> dict[str, Any]:
    if img_formatter is None:
        img_formatter = gpt_format_image

    image_map = {}

    def replace_image_with_id(obj: Any) -> Any:
        image_id = f"[image {len(image_map) + 1}]"
        image_map[image_id] = obj.base64()
        return image_id

    dict_obj = map_images(pydantic_object, replace_image_with_id)
    processed = json.dumps(dict_obj)

    content = [{"type": "text", "text": processed}]
    for image_id, image in image_map.items():
        content.append({"type": "text", "text": image_id + ":"})
        content.append(img_formatter(image))

    return {"role": "user", "content": content}

Basically when I turn the input object into json, I replace all images with an ID.
Then at the end of the message I send the list of (ID, img) pairs.

Works reasonably well.

rzr2kor · 2024-11-08T10:52:20Z

Currently, only you can only pass a single image at a time in a signature.

E.g. this will work
class ImageSignature(dspy.Signature):
    image1: dspy.Image = dspy.InputField()
    image2: dspy.Image = dspy.InputField()
But any more complex types involving images wont:
class ImageSignature(dspy.Signature):
    images: List[dspy.Image] = dspy.InputField()

class ImageSignature(dspy.Signature):
    labeled_images: Dict[str, dspy.Image] = dspy.InputField()
This is due to how images are compiled into OAI compatible messages, where inside chat_adapter.py we create a large list of content blocks by giving fields with an image_url special privileges:
{
    "content": [{
         "type": "text",
         "text": "...",
    },
    {
         "type": "image_url"
         "image_url": {"url": "..."} # url is either an actual url or the base64 data
    }]
}
I do some fairly naive parsing inside ChatAdapter, and there is definitely a more elegant solution here. #1763 addresses the List case, but I want a more generalized solution.

cc @okhat

Hey, I was trying to perform VQA with an LLM using dspy for optimized prompting and I'm not able to pass the base64image to LLM via dspy. Could you let me know how you were able to do it? I tried dspy.Image but I get an error saying No module called dspy.Image. Thanks

okhat · 2024-11-12T13:48:47Z

@rzr2kor Are you on the latest version of DSPy? pip install -U dspy

isaacbmiller · 2024-11-12T19:14:51Z

Then at the end of the message I send the list of (ID, img) pairs.

@thomasahle Did you find that this worked better than interweaving the {"type": "image_url", "image_url": ...}) into your actual text content, or just a design decision

glesperance · 2024-11-13T03:49:07Z

With images complex types it seems like we could unlock MiproV2 w fewshots aware enabled as DescribeProgram / DescribeModule could then be modified to receive program_example that contains images.

thomasahle · 2024-11-14T08:20:49Z

Then at the end of the message I send the list of (ID, img) pairs.

@thomasahle Did you find that this worked better than interweaving the {"type": "image_url", "image_url": ...}) into your actual text content, or just a design decision

I couldn't put it in "the actual context", since that was just one big json string

isaacbmiller self-assigned this Nov 6, 2024

isaacbmiller linked a pull request Nov 15, 2024 that will close this issue

[DNM] Complex image type handling #1801

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image support inside complex types #1767

Image support inside complex types #1767

isaacbmiller commented Nov 6, 2024

thomasahle commented Nov 6, 2024

rzr2kor commented Nov 8, 2024 •

edited

Loading

okhat commented Nov 12, 2024

isaacbmiller commented Nov 12, 2024 •

edited

Loading

glesperance commented Nov 13, 2024 •

edited

Loading

thomasahle commented Nov 14, 2024

Image support inside complex types #1767

Image support inside complex types #1767

Comments

isaacbmiller commented Nov 6, 2024

thomasahle commented Nov 6, 2024

rzr2kor commented Nov 8, 2024 • edited Loading

okhat commented Nov 12, 2024

isaacbmiller commented Nov 12, 2024 • edited Loading

glesperance commented Nov 13, 2024 • edited Loading

thomasahle commented Nov 14, 2024

rzr2kor commented Nov 8, 2024 •

edited

Loading

isaacbmiller commented Nov 12, 2024 •

edited

Loading

glesperance commented Nov 13, 2024 •

edited

Loading