Originally a project submitted for TikTok Techjam 2024, I continued the development of the project and added more features.
Original project here, which was built on top of WhisperLive, a nearly-live implementation of OpenAI's Whisper.
demo.mp4
This video calling web application aids sales representatives by providing real-time, accurate data about their services and leveraging AI to optimize sales strategies. It integrates live video conferencing, emotion detection, and automatic information retrieval to enhance the efficiency and effectiveness of sales interactions.
- 📹 Live Video Conferencing
- Implemented using getstream.io
- 😊 Emotion Detection
- Trained a Random Forest Classification algorithm from scratch using a synthesized facial dataset. Facial landmarks were generated using Mediapipe and fed into the model. For inference, a frame of the video is sent at a fixed interval to the model for emotion detection.
- 📝 Real-Time Transcription
- Implemented using WhisperLive
- 🔍 Live Automatic Information Retrieval
- Implemented using ChromaDB, more details in the implementation section
The project was created with:
-
Frontend
- React
- Vite
- Tailwind CSS
- getstream.io
- socket.io
-
Application Server
- Flask
- WebSockets
- ChromaDB
- OpenAI
- OpenCV
Run the following commands to run this application on your local machine.
git clone https://github.com/martinng01/sales-helper.git
Set up the Anaconda environment with Python 3.11.9
conda create -n sales-helper python==3.11.9
conda activate sales-helper
pip install -r requirements.txt
Install npm dependencies
cd react-video-call
npm install
Run all 3 code blocks in different terminal windows from the base directory:
- Frontend
cd react-video-call
npm run dev
- Application Server
python middleware/middleware.py
- Transcription Server
python WhisperLive/run_server.py
Visit the localhost website in the frontend terminal window.
- Enable audio and video to be received from the other client (Currently audio and video is received from user's side for proof of concept.)