Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Gemini's speech recognition always produces inaccurate timestamps. #618

Open
abhijeet12s opened this issue Nov 16, 2024 · 0 comments
Open

Comments

@abhijeet12s
Copy link

abhijeet12s commented Nov 16, 2024

"Gemini's speech recognition always produces inaccurate timestamps. Instead of relying on it, can we use cv2 image detection to detect scene changes in the video? After detecting these scene changes, we can cut the video into clips based on these scenes. These video clips can then be provided to Gemini for transcription. Finally, we will generate an SRT file and insert the transcribed audio text into it."

@abhijeet12s abhijeet12s changed the title "Gemini's speech recognition often produces inaccurate timestamps. Instead of relying on it, can we use cv2 image detection to detect scene changes in the video? After detecting these scene changes, we can cut the video into clips based on these scenes. These video clips can then be provided to Gemini for transcription. Finally, we will generate an SRT file and insert the transcribed audio text into it." "Gemini's speech recognition always produces inaccurate timestamps. Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant