From 53d7e5c9e50a32b8725ded7f733ddf938a0cbb2c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E7=86=8A=E9=91=AB=E4=BC=9F=20Xinwei=20Xiong?=
 <3293172751NSS@gmail.com>
Date: Mon, 4 Nov 2024 13:59:30 +0800
Subject: [PATCH] Update README.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: 熊鑫伟 Xinwei Xiong <3293172751NSS@gmail.com>
---
 README.md | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/README.md b/README.md
index 3209f6f..3d16a5b 100644
--- a/README.md
+++ b/README.md
@@ -207,6 +207,16 @@ logging:
 	•	WebSocket 通信：前端通过 WebSocket 与服务器通信，发送音频数据，接收处理结果。
 	•	音频播放：接收到服务器返回的音频 URL 后，使用 HTML5 Audio 播放。
 
+### TODO
+
++ 使用同一神经网络和模型处理语音：在此之前语音的实现是：一个简单模型将音频转录为文本，GPT-3.5 或 GPT-4 接收文本并输出文本，第三个简单模型将该文本转换回音频。相当于就是 `ASR -> LM -> TTS` 的这个过程。 这样做当然好， 但是也有一系列的缺陷，比如说延迟很高，比如说丢掉了细节，LLM 并不知道你的用户情感是什么。
++ 允许对接和调用自己的 AI 中台。
+
+### 参考
+
++ [https://openai.com/index/hello-gpt-4o/](https://openai.com/index/hello-gpt-4o/)
++ [https://medium.com/@artificial--intelligence/the-differences-between-asr-and-tts-c85a08269c98](https://medium.com/@artificial--intelligence/the-differences-between-asr-and-tts-c85a08269c98#:~:text=We%20are%20familiar%20with%20the,analogous%20to%20the%20human%20mouth.)
+
 ### 🤝 参与贡献
 
 我们欢迎任何形式的贡献！请阅读 CONTRIBUTING.md 了解更多信息。