Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make continuous reading work when using SAPI5 voices without bookmark support #17523

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

gexgd0419
Copy link
Contributor

@gexgd0419 gexgd0419 commented Dec 14, 2024

Link to issue number:

Fixes #16691

Summary of the issue:

Some SAPI5 synthesizers did not work correctly with continuous reading, stopping speech at the end of a sentence, because they don't properly support bookmark events.

Description of user facing changes

Some of the problematic SAPI5 synthesizers will be able to work correctly with continuous reading.

Description of development approach

NVDA relies on bookmark events, which have to be implemented by the TTS engine itself. But no matter what events the engine supports, StartStream and EndStream events are always generated by the SAPI5 framework.

This fix does the following.

SAPI5 fix

When requested to speak a new sequence, store the bookmarks in a list (deque), and put the bookmark list into a dict _streamBookmarksNew with the stream number as its key. This dict stores bookmark lists before the stream actually starts, because the previous stream (which may use the same stream number) may not have been ended when self.tts.Speak returns.

When StartStream event is raised, move the bookmark list of this stream from _streamBookmarksNew to _streamBookmarks, which stores bookmark lists of currently speaking streams.

When EndStream event is raised, bookmarks stored for the current stream will be triggered (through synthIndexReached) before synthDoneSpeaking. This way NVDA will continue to the next line instead of waiting indefinitely.

As most synthesizers do support bookmarks, to prevent triggering the same bookmark twice, the bookmark stored in the list will be removed when the Bookmark event for it is raised.

Thread-safety: self.tts.Speak and TTS events are handled in the same thread, so there should be no multithread-related problems.

SAPI4 fix

Similar to the SAPI5 fix, which stores bookmarks and replay them when AudioStop is raised.

However, SAPI4 does not provide "stream numbers" to distinguish each request, yet SAPI4 supports queueing speech requests. The fix assumes that speech requests are always processed in a first-in, first-out manner, and uses a deque _bookmarkLists to store bookmark lists from different requests.

When cancel is called, the fix assumes that all queued speech requests are cancelled immediately, and clears the bookmark lists.

Testing strategy:

Tested with the v0.1 version of NaturalVoiceSAPIAdapter. I don't have access to those proprietary voices, so further testing is needed.

Known issues with pull request:

None yet

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@coderabbitai summary

@zstanecic
Copy link
Contributor

@gexgd0419,
This will probably also fix legacy Ivona tts voices, sold by Harpo Sp.Zo.o or with all other old synthesizers.

@AppVeyorBot
Copy link

See test results for failed build of commit 75c580bcf6

@gexgd0419 gexgd0419 marked this pull request as ready for review December 14, 2024 14:51
@gexgd0419 gexgd0419 requested a review from a team as a code owner December 14, 2024 14:51
@gexgd0419 gexgd0419 requested a review from seanbudd December 14, 2024 14:51
seanbudd
seanbudd previously approved these changes Dec 15, 2024
Copy link
Member

@seanbudd seanbudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix @gexgd0419 ! Can you please add a changelog entry

@gexgd0419 gexgd0419 marked this pull request as draft December 16, 2024 01:40
@gexgd0419 gexgd0419 marked this pull request as ready for review December 16, 2024 02:43
@gexgd0419 gexgd0419 marked this pull request as draft December 16, 2024 03:26
@gexgd0419
Copy link
Contributor Author

Just found that even when SAPI5 framework reuses the previous stream number, self.tts.Speak can still return before the previous stream ended. For example:

  • Speak returned stream number 1
  • StartStream event of stream number 1
  • Another Speak returned stream number 1
  • EndStream event of stream number 1 (old stream)
  • StartStream event of stream number 1 (new stream)
  • EndStream event of stream number 1

My code assumes that when Speak returned, the previous stream has ended, so the bookmark list is assigned to the stream number. But if the above happened, bookmarks in the new stream will be triggered when the old stream ends. So this needs to be fixed.

@gexgd0419 gexgd0419 marked this pull request as ready for review December 16, 2024 08:54
@seanbudd seanbudd dismissed their stale review December 16, 2024 23:59

change of approach required

@SaschaCowley
Copy link
Member

We would prefer this be handled for all synthesisers in the speech framework.

@SaschaCowley SaschaCowley marked this pull request as draft December 16, 2024 23:59
@gexgd0419
Copy link
Contributor Author

We would prefer this be handled for all synthesisers in the speech framework.

What does this mean?

This fix affects all SAPI5 and MSSP synthesizers, even those that support bookmark events. But synthesizers that support bookmarks will have "consumed" all stored bookmarks before the end of stream is reached.

If you mean SAPI4 synthesizers, I don't know much about that framework. If SAPI4 has some way to tell the client that it has done speaking, then similar approach may be possible. But in the current SAPI4 implementation, synthDoneSpeaking actually relies on bookmarks.

@gexgd0419
Copy link
Contributor Author

After some studying, I think that applying similar fix to SAPI4 synthesizers is possible. We will need to introduce another sink class which implements ITTSNotifySinkW to get its AudioStart and AudioStop notifications, so that we can replay the missing bookmarks when AudioStop is received.

As for the other two speech frameworks that NVDA supports: eSpeak and OneCore, I don't think that they need to be fixed, as currently there's no third-party OneCore voice that I know of.

Without a unified audio processing system, this fix can only be applied per framework, and only if the framework supports end of audio notifications.

@gexgd0419 gexgd0419 marked this pull request as ready for review December 17, 2024 10:03
@gexgd0419 gexgd0419 requested a review from seanbudd December 17, 2024 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Continuous reading problem on Sapi 5 synthesizers still unsolved
5 participants