Created a TypedPredictorSignature class that builds a signature from Pydantic models - this signature is optimized for use with TypedPredictor and TypedChainOfThought #1655

drawal1 · 2024-10-20T21:22:02Z

📝 Changes Description

This MR/PR contains the following changes:

Created a TypedPredictorSignature class with a single function called create that takes the pydantic classes that define input and output fields and builds a signature optimized to use with the dspy.TypedPredictor to extract structured information from the input.

The advantages of this implementation are:

It significantly reduces the chances of the dreaded "retries..." exception
If default value is specified, predictor will now return default value if the information cannot be extracted
For fields with constraints, an invalid value can be specified along with a validator function (mode="wrap"), and the predictor will return the invalid value if the extracted information does not satisfy the constraints
For fields defined as Optional, the predictor will return null if the information cannot be extracted
Field description and example values will be used to construct the signature for better prediction, if they are specified in the pydantic model
Prefix text can be specified optionally, and it will be used to construct the prompt
Using enum fields in pydantic models will no longer generate parse errors

Here is example usage:

    class CommandExtractionInput(BaseModel):
        command: str

    class OutputParamsSchema(BaseModel):
        @field_validator('name', mode='wrap')
        @staticmethod
        def validate_name(name, handler):
            try:
                return handler(name)
            except ValidationError:
                return 'INVALID_NAME'

        @field_validator('age1', 'age2', 'age3', 'age4', 'age5', 'age6', mode='wrap')
        @staticmethod
        def validate_age(age, handler):
            try:
                return handler(age)
            except ValidationError:
                return -8888

        @field_validator('email1', 'email2', mode='wrap')
        @staticmethod
        def validate_email(email, handler):
            try:
                return handler(email)
            except ValidationError:
                return 'INVALID_EMAIL'

        name: Annotated[str,
                        Field(default='NOT_FOUND', max_length=15,
                            title='Name', description='The name of the person',
                            examples=['John Doe', 'Jane Doe'], json_schema_extra={'invalid_value': 'INVALID_NAME'})
                    ]
        age1: Annotated[int, 
                       Field(gt=0, lt=150, default=-999, json_schema_extra={'invalid_value': '-8888'})]
        age2: Annotated[int, 
                       Field(gt=0, lt=150, json_schema_extra={'invalid_value': '-8888'})] = -999
        age3: Optional[Annotated[int, 
                       Field(gt=0, lt=150, json_schema_extra={'invalid_value': '-8888'})]]

        age4: Annotated[int, 
                       Field(gt=0, lt=150, default=-999, json_schema_extra={'invalid_value': '-8888'})]
        age5: Annotated[int, 
                       Field(gt=0, lt=150, json_schema_extra={'invalid_value': '-8888'})] = -999
        age6: Optional[Annotated[int, 
                       Field(gt=0, lt=150, json_schema_extra={'invalid_value': '-8888'})]]

        email1: Annotated[str, 
                         Field(default='NOT_FOUND', 
                            pattern=r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$',
                            json_schema_extra={'invalid_value': 'INVALID_EMAIL'})
                    ]
        email2: Annotated[str, 
                         Field(default='NOT_FOUND', 
                            pattern=r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$'),
                            json_schema_extra={'invalid_value': 'INVALID_EMAIL'}
                    ]

    dspy_signature_class = TypedPredictorSignature.create(
        CommandExtractionInput, OutputParamsSchema)

    lm = dspy.LM('openai/gpt-3.5-turbo')
    with dspy.context(lm=lm):
        extract_cmd_params = dspy.TypedChainOfThought(
            dspy_signature_class)

        input_for_parameter_extraction = CommandExtractionInput(
            # command = "A random command."
            # command = "My name is kjhd and I am 200 years old. My email is 9236"
            command = "Hello, my name is John Doe and I am 25 years old. My email is [email protected]."
        )
        prediction = extract_cmd_params(**input_for_parameter_extraction.model_dump())

        dspy.inspect_history(n=1)
        print(prediction)

...

✅ Contributor Checklist

Pre-Commit checks are passing (locally and remotely)
Title of your PR / MR corresponds to the required format
[] Commit message follows required format {label}(dspy): {message}

⚠️ Warnings

Anything we should be aware of ?

…utput classes. It takes examples, constraints, defaults and invalid value specifications into account when constructing the signature

…g invalid value. Specifying a default of null for optional age field

…ing field.default in such cases if its not specified

okhat · 2024-10-20T23:49:39Z

Amazing. Having some discussions on Discord at https://discord.com/channels/1161519468141355160/1294140517764042794/1297707179054207028