Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created a TypedPredictorSignature class that builds a signature from Pydantic models - this signature is optimized for use with TypedPredictor and TypedChainOfThought #1655
base: main
Are you sure you want to change the base?
Created a TypedPredictorSignature class that builds a signature from Pydantic models - this signature is optimized for use with TypedPredictor and TypedChainOfThought #1655
Changes from all commits
26a30e9
9ac2244
0a5c4be
980b544
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting to see special support for optional fields hmm, do we need optional fields? Or are Optional[.] types enough? I see that you have this for input fields here and for output fields below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pydantic allows multiple ways to specify a field schema, including Optional property. It could be on the field spec inside the annotation or outside. IMO, we should not let users guess what pydantic field metadata we do and don't support. If we can support it, we should
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting to see examples and metadata per field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are definitely valuable signals. We should leverage Pydantic field metadata instead of duplicating in DSPy signature fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah interesting concept here, requiring a default value for every required output field, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users may provide input without passing the value of a required parameter. So how to indicate that the required value was missing in the input, without throwing an exception? And without forcing the LM to hallucinate a required missing value?
Luckily, you can specify that Pydantic default value should not be subject to validation. This means that regardless of the type (str, int, float, ...), you can specify a default value of "NOT_FOUND" and this signature will correctly detect and return it without hallucinating.
No wrap validators necessary, which in any case don't distinguish between invalid and missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unique nature of agentic applications is that we cannot assume user input is being validated in the UI for required input. Combine this with LM ability to hallucinate when forced to do so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting choice of words. "When extracting" and "return". Something to keep in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work. Any pitfalls here? Are you thinking if the field name was "extracted_name" or field.default was "return"? Not quite sure how to get around that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naive quotes will fail on complex values? same with the naive join below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only issue I have seen is the LM sometimes returning string values enclosed in quotes. For example, returning "'John Doe'", instead of "John Doe". But this is easy to strip. It correctly extracts numbers without putting them in quotes but I guess the prompt could be enhanced to detect if the example values should be quoted. Any other suggestions on how this could be improved?
Also, re. complex types - isn't that what I am testing with complex Input and Output BaseModels? So may be the code for pulling apart complex types and building a custom signature is unavoidable. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting, asking the model to signal bad/hard fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detecting missing and invalid values is a "must-have" requirement for production-quality apps. Here, I am basically trying to avoid the multiple retries that the framework makes when the extracted value does not match the constraints specified in the Pydantic BaseModel.
The LM may be incorrectly extracting the parameter or the user has specified an invalid value. In both cases, the system should not hallucinate the closest value that matches the actual input. Rather, it should flag the invalid field and let the application handle the error with a proper error-correction workflow ("Provided value was invalid. It must be... Here are some examples... Did you mean...?")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use Annotated instead of assignment = Field(...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Pydantic docs - "In case you use field constraints with compound types, an error can happen in some cases. To avoid potential issues, you can use Annotated:"
See https://docs.pydantic.dev/latest/concepts/fields/#validation-alias