Skip to content

nexmo-se/voice-to-ai-engines

Repository files navigation

Reference application using Vonage Voice API to connect Voice Calls to AI Engines

// Following to be updated

Set up

Set up the sample basic middleware server - Host server public hostname and port

First set up the basic middleware server from one of the following repositories
https://github.com/nexmo-se/openai-realtime-connector,
https://github.com/nexmo-se/dg-oai-l11-connector,
https://github.com/nexmo-se/websocket-server-variant-3.

Default local (not public!) of that middleware server port is: 6000.

If you plan to test using Local deployment with ngrok (Internet tunneling service) for both the sample middleware server application and this sample Voice API application, you may set up multiple ngrok tunnels.

For the next steps, you will need:

  • That middleware public hostname and if necessary public port,
    e.g. xxxxxxxx.ngrok.io, xxxxxxxx.herokuapp.com, myserver.mycompany.com:32000 (as PROCESSOR_SERVER),
    no port is necessary with ngrok or heroku as public hostname.

Set up your Vonage Voice API application credentials and phone number

Log in to your or sign up for a Vonage APIs account.

Go to Your applications, access an existing application or + Create a new application.

Under Capabilities section (click on [Edit] if you do not see this section):

Enable Voice

  • Under Answer URL, leave HTTP GET, and enter https://<host>:<port>/answer (replace <host> and <port> with the public host name and if necessary public port of the server where this sample application is running)
  • Under Event URL, select HTTP POST, and enter https://<host>:<port>/event (replace <host> and <port> with the public host name and if necessary public port of the server where this sample application is running)
    Note: If you are using ngrok for this sample application, the answer URL and event URL look like:
    https://yyyyyyyy.ngrok.io/answer
    https://yyyyyyyy.ngrok.io/event
  • Click on [Generate public and private key] if you did not yet create or want new ones, save the private key file in this application folder as .private.key (leading dot in the file name).
    IMPORTANT: Do not forget to click on [Save changes] at the bottom of the screen if you have created a new key set.
  • Link a phone number to this application if none has been linked to the application.

Please take note of your application ID and the linked phone number (as they are needed in the very next section).

For the next steps, you will need:

  • Your Vonage API key (as API_KEY)
  • Your Vonage API secret, not signature secret, (as API_SECRET)
  • Your application ID (as APP_ID),
  • The phone number linked to your application (as SERVICE_PHONE_NUMBER), your phone will call that number,

Local setup

Copy or rename .env-example to .env
Update parameters in .env file
Have Node.js installed on your system, this application has been tested with Node.js version 18.19.1

Install node modules with the command:

npm install

Launch the application:

node pstn-websocket-app

Default local (not public!) of this application server port is: 8000.

Overview of how this application establishes PSTN and WebSocket calls

See corresponding diagram call-flow.png

Step 1 - Establish WebSocket 1 call, once answered drop that leg into a unique named conference (NCCO with action conversation).

Step 2 - Place outbound PSTN 1 call, once answered drop that leg into same named conference (NCCO with action conversation).

Step 4 - Establish WebSocket 2 call, once answered drop that leg into same named conference (NCCO with action conversation).

Step 6 - Place outbound PSTN 2 call, once answered drop that leg into same named conference (NCCO with action conversation).

Additional info

In step 1, regarding WebSocket 1 leg, there are no specific audio controls yet.

In step 2, regarding PSTN 1 leg,
the NCCO with action conversation includes the array parameter canSpeak that lists WebSocket 1 leg uuid,
meaning PSTN 1 sends audio to WebSocket 1 leg,
the array parameter canHear stays empty for now.

In step 3, regarding WebSocket 1 leg,
the NCCO with action conversation includes the array parameter canHear that lists PSTN 1 leg uuid,
meaning WebSocket 1 receives only the audio from PSTN 1 leg,
the array parameter canSpeak stays empty for now.

In step 4, regarding WebSocket 2 leg,
the NCCO with action conversation includes the array parameter canSpeak that lists PSTN 1 leg uuid,
meaning WebSocket 2 sends audio only to PSTN 1 leg,
the array parameter canHear stays empty for now.

In step 5, regarding PSTN 1 leg,
the NCCO with action conversation includes the array parameter canSpeak that lists WebSocket 1 leg uuid,
meaning PSTN 1 sends audio only to WebSocket 1 leg,
the array parameter canHear that lists WebSocket 2 leg uuid,
meaning PSTN 1 receives audio only from WebSocket 2 leg.

In step 6, regarding PSTN 2 leg,
the NCCO with action conversation includes the array parameter canSpeak that lists WebSocket 2 leg uuid,
meaning PSTN 2 sends audio only to WebSocket 2 leg,
the array parameter canHear that lists WebSocket 1 leg uuid,
meaning PSTN 2 receives audio only from WebSocket 1 leg.

In step 7, regarding WebSocket 1 leg,
the NCCO with action conversation includes the array parameter canSpeak that lists PSTN 2 leg uuid,
meaning WebSocket 1 sends audio only to PSTN 2 leg,
the array parameter canHear that lists PSTN 1 leg uuid,
meaning WebSocket 1 receives audio only from PSTN 1 leg.

In step 8, regarding WebSocket 2 leg,
the NCCO with action conversation includes the array parameter canSpeak that lists PSTN 1 leg uuid,
meaning WebSocket 2 sends audio only to PSTN 1 leg,
the array parameter canHear that lists PSTN 2 leg uuid,
meaning WebSocket 2 receives audio only from PSTN 2 leg.

In steps 5 and 6, both NCCOs with action conversation include endOnExit true flag because if either PSTN 1 or PSTN 2 remote party ends the call, then all legs attached to the same conference should be terminated.

In step 2, the NCCO with action conversation does not include endOnExit true flag because it may automatically terminate all legs which is an undesired behavior.

Application automatically terminates PSTN 2 leg call setup in progress (e.g. in ringing state, ...) if PSTN 1 leg remote party hung up while PSTN 2 party is being called or just answered.

When establishing WebSockets, desired custom meta data that should be transmitted to the middleware server are passed as query parameters in the WebSocket URI itself.
In this sample code, sample meta data parameters are passed, you may define what are needed for your application logic.

Try the application

From a web browser trigger test calls with the web address:

https://<server-address/startcall

or

https://<server-address/startcall?pstn1=12995551212&pstn2=12995551313&param1=en-US&param2=es-MX

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published