Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

Feature suggestion: allow for different phrasing of numbers #58

Closed
kendonB opened this issue Feb 14, 2017 · 14 comments
Closed

Feature suggestion: allow for different phrasing of numbers #58

kendonB opened this issue Feb 14, 2017 · 14 comments

Comments

@kendonB
Copy link

kendonB commented Feb 14, 2017

For example, I am unable to say "numb one thirty five", and have to say "numb one hundred and thirty five" when using caster. For six digit numbers, it is much simpler to dictate the individual digits.

ref https://github.com/synkarius/caster/issues/174

@nihlaeth
Copy link

isn't it trivial to create a grammar rule that allows this?

@chilimangoes
Copy link

If there is a trivial way to write a grammar for this, I haven't been able to find it. I think a big part of the problem is the ambiguity in how (at least for English speakers) we say long numbers.

For example, say I have the number 304027. The "correct" way of saying this would be "three hundred and four thousand and twenty seven". It also seems to be the only way to say 304027 and have it be recognized as a single number in a dragonfly grammar. But, in English, we might say something like "three oh four oh two seven" or "thirty forty twenty seven". In dictation mode, Dragon NaturallySpeaking does its best to try to figure out what you meant, using some kind of heuristics. But those same heuristics don't seem to be applied to numbers used in dragonfly grammars.

I spent a fair amount of time at one point last year trying to modify my copy of the numbers.py grammar in Caster to support this. I created a rule definition that was something like "numb [] [] [] []" which worked most of the time but also produced the wrong results fairly often. A couple of the failure modes that I remember were, for example, if you said "thirty forty twenty seven", it might be interpreted as either "30 40 27" or "30 40 20 7" and if you said "two hundred fifteen", it might be interpreted as "215", "200 15", "2 115", or "2 100 15". I remember I tried a variety of things to get around these ambiguity issues, and I was able to resolve some of them, but in the end it was more trouble than it was worth.

@kendonB I think what I might do is just create a rule that allows you to quickly string together a list of single digits 0-9, like the "three oh four oh two seven" example. It's not as flexible, and won't allow you to say things like "thirty forty twenty seven", but it should at least resolve the problem of ambiguity.

@nihlaeth
Copy link

nihlaeth commented Feb 16, 2017

let DNS internally handle number complexity, they do that quite well. You can just use an IntegerRef with an alternative digit sequence: <number> | (<digit>)+

Edit: just realised I only covered the last case mentioned by OP.

@synkarius
Copy link

synkarius commented Feb 16, 2017

Where did that + syntax come from? Does that actually work? I've never seen it before.

Edit: I have seen it before. It was discussed here: #15
@t4ngo said it couldn't be done due to limitations in DNS.

@nihlaeth
Copy link

It came from natlink source code. It's the same as using a Repetition element, and I used it as a lazy way to indicate that you should. I don't know if dragonfly can handle it, natlink can in any case.

@synkarius
Copy link

synkarius commented Feb 17, 2017

@nihlaeth I'm looking the NatLinkTalk powerpoint (which along with the VoiceCoders powerpoint and natlink.txt are the best NatLink docs that I am aware of.) On page 29 of the NatLinkTalk powerpoint, I do see that + syntax you're referring to, but as I'm reading the docs, it can only be used to design a spec, not to influence the execution of an action.

IOW, I can create a spec that accepts either
one two three
or
one two two two two two three
but both specs will cause the same action to execute. Am I reading this correctly? If not, would you happen to have an example in which the execution of the action is influenced by the repetition?

@nihlaeth
Copy link

You could have rule introspect the raw dictation data, and act differently according to the number of twos in there. But you'd have to do the parsing yourself, it's lots easier to use Repetition for that.

@nihlaeth
Copy link

I have a fork of natlink online (I plan to rewrite parts of it), you might try reading the Python code, instead of relying on incomplete PowerPoints. There's lots of interesting stuff in there: https://github.com/nihlaeth/dragon_whisperer/blob/master/MacroSystem/core/gramparser.py

@synkarius
Copy link

synkarius commented Feb 17, 2017

Ah! Yes, introspecting the raw dictation data would work. I hadn't thought of that. So really, all you need are two commands, one that lets you repeat numbers however many times, and one that inspects the dictation data. You make them chainable (via either CCR in Dragonfly or dropping into raw NatLink for access to that + operator), and then parse it yourself. The parsing would be tedious for more complicated use cases, but for numbers it wouldn't be terrible. That's a good idea.

I have periodically skimmed the NatLink source, but never examined it in detail. I'm very glad to see you're looking to update it to 3.6!

@synkarius
Copy link

@kendonB my discussion with @nihlaeth is more academic than practical. I suggest you take @chilimangoes's suggestion and just make the 0-9 command. You'd get a lot of mileage out of that for the effort comparable to parsing the raw dictation data even if the latter would cover most if not all of @chilimangoes's aforementioned examples.

@nihlaeth
Copy link

nihlaeth commented Feb 17, 2017

I whipped up a quick example. I didn't test it, but it should bring across the point.

from dragonfly import CompoundRule, Repetition, IntegerRef, Text, Choice

class NumberRule(CompoundRule):

    """Numbers in all possible forms."""

    def __init__(self, *args, **kwargs):
        self.spec = "number (<number>|<digits>)"
        self.extras = [
            IntegerRef(name="number", min=0, max=100000),
            Repetition(
                name='digits',
                child=Choice(name='digit', choices={
                    'zero': 0,
                    'one': 1,
                    'two': 2,
                    'three': 3,
                    'four': 4,
                    'five': 5,
                    'six': 6,
                    'seven': 7,
                    'eight': 8,
                    'nine': 9}),
                min=1,
                max=12)
            ]
        CompoundRule.__init__(self, *args, **kwargs)

    def value(self, node):
        if node.has_a_child_with_name('number'):
            return Text(node.get_child_by_name('number').value())
        else:
            return Text(''.join(
                [n.value() for n in node.get_children_by_name('digit')]))

    def _process_recognition(self, node, extras):
        self.value(node).execute()

@t4ngo
Copy link
Owner

t4ngo commented Feb 20, 2017

Take a look at the dragonfly.language.Number and NumberRef classes:
https://github.com/t4ngo/dragonfly/blob/master/dragonfly/language/base/number.py

That Number class wraps some integer building blocks, allowing a long number to be spoken as "twelve thirty-four five sixty-seven" giving 1234567. The underlying parsing tries to contract the numbers as far as possible, e.g. "thirty four" will always result in 34 instead of 304.

Dragonfly's built-in Number is similar in concept to @nihlaeth's NumberRule above, but it's an element instead of a rule (so easier to reuse) and it is language agnostic (i.e. automatically works for English, German, Dutch, etc. because it uses underlying Integer elements without hard coding words).

In this vein, you might also be interested in the dragonfly.language.en.calendar.Date class which lets you say things like "October 12, 2016", "21 February 2017", "14 days ago", "tomorrow", and "next week Friday". It's the same approach as the Number class, wrapping various human language constructs into an easy-to-use element. Useful for manipulating calendars and such.

@t4ngo t4ngo closed this as completed Feb 20, 2017
@kendonB
Copy link
Author

kendonB commented Feb 26, 2017

Is someone able to provide a noob-friendly implementation of this? Specifically, what should I change in numbers.py in caster? Either or both of the solutions by @nihlaeth or dragonfly.language.Number would be helpful.

@chilimangoes
Copy link

I won't be in front of a computer for a while to be able to test this, but I think it might be as simple as adding from dragonfly.language import Number, NumberRef in numbers.py and then changing the line IntegerRefST("wnKK", 0, 1000000) to use either Number or NumberRef instead of IntegerRefST. If that doesn't work, post your results and any errors you get back on the original issue you opened in the Caster repo and I'll try to help you there.

daanzu pushed a commit to daanzu/dragonfly-old that referenced this issue Mar 23, 2019
Make various changes to the SAPI 5 engine backend and test suites
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants