-
Notifications
You must be signed in to change notification settings - Fork 5
DataTemple Release Notes
There is now spell-checking in DataTemple! Install Hunspell (see below) and use the "-z" flag to turn it on. The spell-checking is used for all word matches, and if the sentence fails because of parsing matches, the entire sentence is modified to use correctly-spelled words and the match is attempted again.
The spell-checking relies on Hunspell, a powerful, open-source spell-checker based on MySpell. You can install it at http://hunspell.sourceforge.net/
I've added a "serial mode". A simple "-s" on the command line makes the whole system inch through the sentences one at a time, and for each sentence, to try each template in order and stop after the first match.
Templates need to be defined in the order they are to be matched, and all need to start with either %sentence or %sentences.
The DataTemple tool now takes templates from files! For consistency, I also changed some of the command-line arguments, so pay careful attention.
The old system of providing preprocessor definitions, a template, and a command on the command-line still works, but to use it, the arguments are capitalized: -P for preprocessors, -T for a template, -O for a command. To be consistent, you also specify input on the command line with a capital -I.
Reading all of these from a file takes a lower-case. The preprocessor file (specified with -p) should be a list of commands (like @defwc) with one command per line. The template file (specified with -t) should come in pairs of template and associated command, with blank lines and lines that begin with '#' ignored, like this:
# Template 1:
%sentence Look at *
@print *
# Template 2:
%sentence Destroy every %noun
@print %noun
Input should be specified after any templates, and will be matched against all preceding templates.
The big improvement is declinations. Declinations are modified variables, which can be used for matching or producing results. For example, %noun:capital used in a template only matches capitalized nouns, but used in an output will capitalized the corresponding %noun.
The following are the currently supported declinations:
- :lower - matches or produces lower case words
- :capital - matches or produces words whose first letter is capitalized (ignoring 'of', 'and', 'a', and 'the')
- :s - matches or produces a verb in the present tense
- :ed - matches or produces a verb in the past tense
- :ing - matches or produces a present participle
- :en - matches or produces a past participle
- :x - matches or produces an verb in its infinitive form
Multiple declinations may be used. For example, you might have "%verb:ing:lower".
Other changes include:
- There was a bug in verb conjugation for 3 letter words to the present participle, which is now fixed.
- %noun now matches nouns identified as plural and/or proper.
- Improved user error messages, with a logging and notification system, so user-generated errors don't always display a whole stack trace.
-
I've added a "%pronoun" variable, which matches PennTree parts PRN (personal pronouns) and WP (Wh-pronouns), but not possessive pronouns.
-
DataTemple.exe now accepts a "-tag" command argument, which commands it to output the part-of-speech tags, and a "-parse" argument, which outputs the grammatical tree structure.
-
You can now refer to matched text with a numbered suffixes. For example, %pronoun2 in the output command produces the second matched pronoun. However, only the text as written in the input will be used-- no transformations allowed.
-
I rearranged the release package, to have a data/ and a bin/ directory, and made paths in config.xml relative to that file. So you can now use a command like this, without any changes to the config file:
mono DataTemple.exe -c ../config.xml -i "He fed it." -t "%sentence %pronoun %verb %pronoun" -o "@print %pronoun is %verbing %pronoun2"
-
I added better error messages for missing paths to the config file and to plugins.