Consider a SendAudio overload that allows streaming through something other than Stream #242

lucasmeijer · 2024-10-03T13:02:57Z

Confirm this is a feature request for the .NET library and not the underlying OpenAI API

This is a feature request for the .NET library

Describe the feature or improvement you are requesting

I'm experimenting with the new c# bindings for the realtime voice api. So far so good, except the sending of audio is not great. My audio understandably comes from a microphone. I get a bunch of samples at a time. Right now the only way to get this to the c# binding is by SendAudioAsync(Stream). people in this common scenario have no stream though. the api is forcing people to put the samples in a MemoryStream, which is inefficient, as it will just grow and grow.

The other overload SendAudioAsync(BinaryData) looked promising, but it only supports sending a complete recording, which is a very rare scenario for a realtime api.

Additional context

No response

trrwilson · 2024-10-03T18:57:49Z

Hello, @lucasmeijer, and thanks for diving into the Realtime API!

That BinaryData-based overload you came across is the intended way to accomplish event-driven audio like you're describing; it maps directly to the WebSocket protocol's underlying input_audio_buffer.append command and can be called repeatedly for bite-sized chunks of input; it doesn't need to be a whole recording at all once!

With the microphone input you're using, what kind of samples are you working with and what would make things easier to integrate? Input audio integration is one of the areas where we'd like to facilitate as much as we can -- although sending those BinaryData blocks individually should be able to make it work, it's not as pleasant or idiomatic as it'd ideally be.

lucasmeijer · 2024-10-04T07:28:28Z

Hey @trrwilson

I misinterpreted the binarydataoverload, because it's broken:

in,

public async Task SendAudioAsync(BinaryData audio, CancellationToken cancellationToken = default)

you're forgetting to reset _sendingAudio to false like you do in the Stream version.

To your other question:
I'm talking to my microphone directly from c#. there's not great support for this in c#, so I'm pinvoking into libbass. It's a bit annoying, but not the end of the world. I suspect "add great media support to .net" is a bit out of scope for the openai library :). most people probably get the audio from somewhere else, so should be fine.

I really appreciate you having kept SendCommandAsync public, so I can work around this bug for now by just sending my own json payload.

trrwilson · 2024-10-04T17:13:34Z

Ah, thanks @lucasmeijer -- you're absolutely right, that overload will block forever after the first send. I've corrected the behavior (and added tests) in a development branch and we'll get that fixed in the next preview release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider a SendAudio overload that allows streaming through something other than Stream #242

Consider a SendAudio overload that allows streaming through something other than Stream #242

lucasmeijer commented Oct 3, 2024

trrwilson commented Oct 3, 2024

lucasmeijer commented Oct 4, 2024

trrwilson commented Oct 4, 2024

Consider a SendAudio overload that allows streaming through something other than Stream #242

Consider a SendAudio overload that allows streaming through something other than Stream #242

Comments

lucasmeijer commented Oct 3, 2024

Confirm this is a feature request for the .NET library and not the underlying OpenAI API

Describe the feature or improvement you are requesting

Additional context

trrwilson commented Oct 3, 2024

lucasmeijer commented Oct 4, 2024

trrwilson commented Oct 4, 2024