v0.0.50
Added
-
Added
GeminiMultimodalLiveLLMService
. This is an integration for Google's Gemini Multimodal Live API, supporting:- Real-time audio and video input processing
- Streaming text responses with TTS
- Audio transcription for both user and bot speech
- Function calling
- System instructions and context management
- Dynamic parameter updates (temperature, top_p, etc.)
-
Added
AudioTranscriber
utility class for handling audio transcription with Gemini models. -
Added new context classes for Gemini:
GeminiMultimodalLiveContext
GeminiMultimodalLiveUserContextAggregator
GeminiMultimodalLiveAssistantContextAggregator
GeminiMultimodalLiveContextAggregatorPair
-
Added new foundational examples for
GeminiMultimodalLiveLLMService
:26-gemini-multimodal-live.py
26a-gemini-multimodal-live-transcription.py
26b-gemini-multimodal-live-video.py
26c-gemini-multimodal-live-video.py
-
Added
SimliVideoService
. This is an integration for Simli AI avatars.
(see https://www.simli.com) -
Added NVIDIA Riva's
FastPitchTTSService
andParakeetSTTService
.
(see https://www.nvidia.com/en-us/ai-data-science/products/riva/) -
Added
IdentityFilter
. This is the simplest frame filter that lets through all incoming frames. -
New
STTMuteStrategy
calledFUNCTION_CALL
which mutes the STT service during LLM function calls. -
DeepgramSTTService
now exposes two event handlerson_speech_started
andon_utterance_end
that could be used to implement interruptions. See new exampleexamples/foundational/07c-interruptible-deepgram-vad.py
. -
Added
GroqLLMService
,GrokLLMService
, andNimLLMService
for Groq, Grok, and NVIDIA NIM API integration, with an OpenAI-compatible interface. -
New examples demonstrating function calling with Groq, Grok, Azure OpenAI, Fireworks, and NVIDIA NIM:
14f-function-calling-groq.py
,14g-function-calling-grok.py
,14h-function-calling-azure.py
,14i-function-calling-fireworks.py
, and14j-function-calling-nvidia.py
. -
In order to obtain the audio stored by the
AudioBufferProcessor
you can now also register anon_audio_data
event handler. Theon_audio_data
handler will be called every timebuffer_size
(a new constructor argument) is reached. Ifbuffer_size
is 0 (default) you need to manually get the audio as before usingAudioBufferProcessor.merge_audio_buffers()
.
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(processor, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels)
- Added a new RTVI message called
disconnect-bot
, which when handled pushes anEndFrame
to trigger the pipeline to stop.
Changed
-
STTMuteFilter
now supports multiple simultaneous muting strategies. -
XTTSService
language now defaults toLanguage.EN
. -
SoundfileMixer
doesn't resample input files anymore to avoid startup delays. The sample rate of the provided sound files now need to match the sample rate of the output transport. -
Input frames (audio, image and transport messages) are now system frames. This means they are processed immediately by all processors instead of being queued internally.
-
Expanded the transcriptions.language module to support a superset of languages.
-
Updated STT and TTS services with language options that match the supported languages for each service.
-
Updated the
AzureLLMService
to use theOpenAILLMService
. Updated theapi_version
to2024-09-01-preview
. -
Updated the
FireworksLLMService
to use theOpenAILLMService
. Updated the default model toaccounts/fireworks/models/firefunction-v2
. -
Updated the
simple-chatbot
example to include a Javascript and React client example, using RTVI JS and React.
Removed
- Removed
AppFrame
. This was used as a special user custom frame, but there's actually no use case for that.
Fixed
-
Fixed a
ParallelPipeline
issue that would cause system frames to be queued. -
Fixed
FastAPIWebsocketTransport
so it can work with binary data (e.g. using the protobuf serializer). -
Fixed an issue in
CartesiaTTSService
that could cause previous audio to be received after an interruption. -
Fixed Cartesia, ElevenLabs, LMNT and PlayHT TTS websocket reconnection. Before, if an error occurred no reconnection was happening.
-
Fixed a
BaseOutputTransport
issue that was causing audio to be discarded after anEndFrame
was received. -
Fixed an issue in
WebsocketServerTransport
andFastAPIWebsocketTransport
that would cause a busy loop when using audio mixer. -
Fixed a
DailyTransport
andLiveKitTransport
issue where connections were being closed in the input transport prematurely. This was causing frames queued inside the pipeline being discarded. -
Fixed an issue in
DailyTransport
that would cause some internal callbacks to not be executed. -
Fixed an issue where other frames were being processed while a
CancelFrame
was being pushed down the pipeline. -
AudioBufferProcessor
now handles interruptions properly. -
Fixed a
WebsocketServerTransport
issue that would prevent interruptions withTwilioSerializer
from working. -
DailyTransport.capture_participant_video
now allows capturing user's screen share by simply passingvideo_source="screenVideo"
. -
Fixed Google Gemini message handling to properly convert appended messages to Gemini's required format.
-
Fixed an issue with
FireworksLLMService
where chat completions were failing by removing thestream_options
from the chat completion options.