-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a TranscriptProcessor and new frames #860
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments.
CHANGELOG.md
Outdated
format. | ||
- New examples: `28a-transcription-processor-openai.py`, | ||
`28a-transcription-processor-openai.py`, and | ||
`28a-transcription-processor-openai.py`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be ...-anthropic.py and -gemini.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
async def on_first_participant_joined(transport, participant): | ||
await transport.capture_participant_transcription(participant["id"]) | ||
# Kick off the conversation. | ||
await task.queue_frames([LLMMessagesFrame(messages)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're trying to move away from sending an LLMMessagesFrame
to start the conversation.
I've been trying to do this instead:
await task.queue_frames([context_aggregator.user().get_context_frame()])
The reason is that some LLM operations require that the llm service have a persistent context object. Function calling is an example.
If we just send an LLMMessagesFrame
the llm service creates a context to use temporarily. But then when the context_aggregator pushes the next frame, that context gets overwritten.
I've actually been wondering if we should formally deprecate starting a pipeline with an LLMMessagesFrame
and throw a warning.
(It's fine to use an LLMMessagesFrame
later, when there's already a context.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. If we want to move in this direction (and it seems we do), then we'll want to:
- document it
- (maybe) emit an error if it's used to initialize the conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
This frame is emitted when new messages are added to the conversation history, | ||
containing only the newly added messages rather than the full transcript. | ||
Messages have normalized roles (user/assistant) regardless of the LLM service used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explicitly say here that messages are always in the OpenAI standard messages format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -112,11 +112,59 @@ def get_messages_for_logging(self) -> str: | |||
msgs.append(msg) | |||
return json.dumps(msgs) | |||
|
|||
def from_standard_message(self, message): | |||
def from_standard_message(self, message) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change isn't necessary.
The OpenAI format allows content
to be a string or a list. The list elements can be of the form { "type": "text", "text": text }
.
If we do make a change like this, we have to handle non-text content parts properly. (Image, audio.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted + added docstring documenting behavior.
def to_standard_messages(self, obj) -> list: | ||
"""Convert OpenAI message to standard structured format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here. I don't think we want this change.
The approach I was taking with this is that everywhere in the whole Pipecat codebase wherever we deal with messages, we should always accept both:
- the old-style OpenAI shortcut
"content": text
- the newer, more flexible
"content": List[ContentPart]
This is kind of a pain. But there are too many corner cases in terms of where in code new messages might get defined and injected to do anything else, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want this, then the standard message format is both:
- the old-style OpenAI shortcut "content": text
- the newer, more flexible "content": List[ContentPart]
That means code needs to handle both formats, which is suboptimal. But, if it's not possible, then so be it. I'll revert this change and the from_standard_message format. That will require handling both old and new formats in the application code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted + added docstring documenting behavior.
072e35b
to
00af859
Compare
00af859
to
c4d8572
Compare
|
||
role: Literal["user", "assistant"] | ||
content: str | ||
timestamp: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go in transcript_processor
since it's not a frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nevermind! 🤦
frame: Input frame to process | ||
direction: Frame processing direction | ||
""" | ||
await super().process_frame(frame, direction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably don't need this method since it doesn't do anything. the parent FrameProcessor
one will be used.
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor | ||
|
||
|
||
class BaseTranscriptProcessor(FrameProcessor, ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need ABC
self._assistant_processor = AssistantTranscriptProcessor(**kwargs) | ||
self._event_handlers = {} | ||
|
||
def user(self) -> UserTranscriptProcessor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want **kwargs
here and create the UserTranscriptProcessor
here instead.
``` | ||
""" | ||
|
||
def __init__(self, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't want to pass kwargs
here. Both processors might have different arguments and we would not be able to pass them like this.
def __init__(self, **kwargs): | ||
"""Initialize factory with user and assistant processors.""" | ||
self._user_processor = UserTranscriptProcessor(**kwargs) | ||
self._assistant_processor = AssistantTranscriptProcessor(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize to None
This looks great! Just added a few comments. |
…om_standard_message functions
963059f
to
8a7a619
Compare
@aconchillo thanks for the feedback! This is ready for review. |
LGTM! |
Please describe the changes in your PR. If it is addressing an issue, please reference that as well.
Open issues to resolve:
Potential issues:
user
messages be added to the context to direct the LLM. This will appear as transcribed speech. I don't know a good way around this. This is really an Anthropic problem.Notable:
to_standard_message
function from each LLM. The provides access to the context in the same message format. I had to update OpenAI to support the "standard message" format so I didn't have to add special case handling. @kwindla and I talked about this, so this might change depending on what we decide.UPDATE (12/16):
openai_llm_context
forto_standard_messages
andfrom_standard_message
. Instead, I'll just handle both simple and list content in the TranscriptProcessor.assistant
messages from theOpenAILLMContextFrame
s