You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Confirm this is a feature request for the .NET library and not the underlying OpenAI API
This is a feature request for the .NET library
Describe the feature or improvement you are requesting
I'm experimenting with the new c# bindings for the realtime voice api. So far so good, except the sending of audio is not great. My audio understandably comes from a microphone. I get a bunch of samples at a time. Right now the only way to get this to the c# binding is by SendAudioAsync(Stream). people in this common scenario have no stream though. the api is forcing people to put the samples in a MemoryStream, which is inefficient, as it will just grow and grow.
The other overload SendAudioAsync(BinaryData) looked promising, but it only supports sending a complete recording, which is a very rare scenario for a realtime api.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Hello, @lucasmeijer, and thanks for diving into the Realtime API!
That BinaryData-based overload you came across is the intended way to accomplish event-driven audio like you're describing; it maps directly to the WebSocket protocol's underlying input_audio_buffer.append command and can be called repeatedly for bite-sized chunks of input; it doesn't need to be a whole recording at all once!
With the microphone input you're using, what kind of samples are you working with and what would make things easier to integrate? Input audio integration is one of the areas where we'd like to facilitate as much as we can -- although sending those BinaryData blocks individually should be able to make it work, it's not as pleasant or idiomatic as it'd ideally be.
I misinterpreted the binarydataoverload, because it's broken:
in,
public async Task SendAudioAsync(BinaryData audio, CancellationToken cancellationToken = default)
you're forgetting to reset _sendingAudio to false like you do in the Stream version.
To your other question:
I'm talking to my microphone directly from c#. there's not great support for this in c#, so I'm pinvoking into libbass. It's a bit annoying, but not the end of the world. I suspect "add great media support to .net" is a bit out of scope for the openai library :). most people probably get the audio from somewhere else, so should be fine.
I really appreciate you having kept SendCommandAsync public, so I can work around this bug for now by just sending my own json payload.
Ah, thanks @lucasmeijer -- you're absolutely right, that overload will block forever after the first send. I've corrected the behavior (and added tests) in a development branch and we'll get that fixed in the next preview release.
Confirm this is a feature request for the .NET library and not the underlying OpenAI API
Describe the feature or improvement you are requesting
I'm experimenting with the new c# bindings for the realtime voice api. So far so good, except the sending of audio is not great. My audio understandably comes from a microphone. I get a bunch of samples at a time. Right now the only way to get this to the c# binding is by SendAudioAsync(Stream). people in this common scenario have no stream though. the api is forcing people to put the samples in a MemoryStream, which is inefficient, as it will just grow and grow.
The other overload SendAudioAsync(BinaryData) looked promising, but it only supports sending a complete recording, which is a very rare scenario for a realtime api.
Additional context
No response
The text was updated successfully, but these errors were encountered: