Skip to content

Commit

Permalink
[feature] Add Γ-expressions
Browse files Browse the repository at this point in the history
  • Loading branch information
bluebear94 committed May 25, 2018
1 parent ce54016 commit c03c8e3
Show file tree
Hide file tree
Showing 9 changed files with 233 additions and 4 deletions.
79 changes: 77 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,10 +252,85 @@ the static verifier considers `$(C:1)` seen at the `b` (assuming left-to-right
checking), but when the rule is actually run, `$(C:1)` could be captured
earlier.

#### Lua scripting

Lua code blocks are surrounded by `$$` (two dollar signs on each side) around
them. Funky things will happen if you happen to have that string within
your Lua code.

*Global* Lua code blocks are run once during the invocation of `ztš`. The
syntax is `executeOnce <lua_code>`; for instance, if you want to print a
string once in a program, insert the following:

executeOnce $$
print("soonoyun i lua!")
$$

Of course, the real magic comes when you make rules fire only when a certain
condition is met. This is the infamous Γ from UDN. Just pop a Lua code block
right after the environment (if you have one):

a+ -> "OKITA-SAN DAISHOURI!" (~ _ ~) $$ isPrime(M.n) $$;

Given a string with *n* `a`s (and nothing else), this rule replaces it with
`OKITA-SAN DAISHOURI!` if *n* is prime (given a suitable definition of
`isPrime`, of course). Otherwise, the substitution is not done:

aaaaaaa -> OKITA-SAN DAISHOURI!
aaaaaaaa -> aaaaaaaa

The following variables are available inside Γ-expressions:

* `W`: the word as it is before the rule is applied. A list of *phoneme spec*
objects, one-indexed. (I personally dislike one-indexing, but that's what
Lua does.)
* `M`: a table with the following entries:
* `s`: the index of the first character matched (from one).
* `e`: the index right after the last character matched (from one).
That is, `e` can range from `1` to `#W + 1`.
* `n`: the number of characters matched – `e - s`.

`sca` is available pretty much everywhere and refers to the current SCA
object.

##### `ztš.SCA`

sca:getPhoneme(name) -- name is a string, returns a phoneme spec object
sca:getFeature(name) -- name is a string, returns a pair (index, feature)
-- index is from zero; feature is a feature object
sca:getClass(name) -- similar, but returns char class object as 2nd elem
sca:getFeatureByIndex(index) -- similar to the two above, but take in the
sca:getClassByIndex(index) -- index and return only the relevant object

##### `ztš.SCA.PhonemeSpec`

ps:getName() -- returns the name of this phoneme spec
ps:getCharClass(sca) -- takes in an SCA object, returns a char class
ps:getCharClassIndex() -- just returns the index
-- (you can feed it into sca:getClassByIndex)
ps:getFeatureValue(fid, sca)
-- fid is a feature index (get from sca:getFeature)
-- sca is an sca object
-- returns a 1-based index into the
-- feature:getInstanceNames() table
ps:getFeatureName(fid, sca)
-- similar, but actually returns the name

##### `ztš.SCA.Feature`

feature:getName() -- returns the name
feature:getInstanceNames() -- returns a table of instance names for
-- this feature
feature:getDefault() -- default instance, is an index
feature:isCore() -- true if core feature
feature:isOrdered() -- true if ordered feature

##### `ztš.SCA.CharClass`

cc:getName() -- returns the name

#### Unimplemented features

* The `[<Γ>]` you love from UDN is not supported yet. I'll probably embed
a scripting language to support this in the future.
* Heck, why not add looping rules and such?
* Disjunction in constraints is not yet supported in general (e. g. it's not
yet possible to match phonemes with, say, `pa=lb` or `ma=pl`). This can
Expand Down
7 changes: 7 additions & 0 deletions include/Rule.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
#include <variant>
#include <vector>

#include <lua.hpp>

#include "PHash.h"
#include "PUnique.h"
#include "errors.h"
Expand Down Expand Up @@ -142,7 +144,12 @@ namespace sca {
const SoundChange& sc) const override;
MString alpha, omega;
std::vector<std::pair<MString, MString>> envs;
int gammaref = LUA_NOREF;
bool inv;
bool setGamma(lua_State* luaState, const std::string_view& s);
private:
bool evaluate(lua_State* luaState,
const WString& word, size_t mstart, size_t mend) const;
};
struct CompoundRule : public Rule {
std::optional<size_t> tryReplaceLTR(
Expand Down
3 changes: 2 additions & 1 deletion include/SCA.h
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ namespace sca {
void addGlobalLuaCode(const LuaCode& lc);
std::string executeGlobalLuaCode();
std::string wStringToString(const WString& ws) const;
lua_State* getLuaState() const { return luaState.get(); }
private:
std::vector<CharClass> charClasses;
std::vector<Feature> features;
Expand All @@ -117,7 +118,7 @@ namespace sca {
std::vector<SoundChange> rules;
std::unordered_multimap<
PhonemeSpec, std::string, PSHash, PSEqual> phonemesReverse;
std::unique_ptr<lua_State, decltype(&lua_close)>
mutable std::unique_ptr<lua_State, decltype(&lua_close)>
luaState;
std::string globalLuaCode;
};
Expand Down
2 changes: 1 addition & 1 deletion include/sca_lua.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@ namespace sca {
int init(lua_State* l);
// Push a pointer to an SCA on the Lua stack
int pushSCA(lua_State* l, SCA& sca);
int pushPhoneme(lua_State* l, PhonemeSpec& phoneme);
int pushPhonemeSpec(lua_State* l, PhonemeSpec& phoneme);
}
}
9 changes: 9 additions & 0 deletions src/Parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,15 @@ namespace sca {
r->envs.clear();
r->inv = false;
}
const Token& gamma = peekToken();
if (gamma.is<LuaCode>()) {
getToken();
bool res = r->setGamma(sca->getLuaState(), gamma.as<LuaCode>().code);
if (!res) {
std::cerr << lua_tostring(sca->getLuaState(), -1) << "\n";
return std::nullopt;
}
}
return std::move(r);
}
std::optional<std::unique_ptr<CompoundRule>> Parser::parseCompoundRule() {
Expand Down
49 changes: 49 additions & 0 deletions src/Rule.cpp
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
#include "Rule.h"

#include <assert.h>
#include <string.h>

#include <algorithm>
#include <iostream>

#include "SCA.h"
#include "iterutils.h"
#include "matching.h"
#include "sca_lua.h"

/*
For the implementation of the SimpleRule::verify and CompoundRule::verify
Expand Down Expand Up @@ -239,6 +241,8 @@ namespace sca {
auto end = *match;
assert(end >= istart);
size_t s = (size_t) (end - istart);
bool gammaMatches = evaluate(sca.getLuaState(), str, start, start + s);
if (!gammaMatches) return std::nullopt;
// Now replace subrange
WString omegaApp;
for (const MChar& oc : omega)
Expand All @@ -263,6 +267,12 @@ namespace sca {
auto end = *match;
assert(end >= istart);
size_t s = (size_t) (end - istart);
size_t eifwd = str.size() - 1 - start;
bool gammaMatches = evaluate(
sca.getLuaState(), str,
eifwd - s,
eifwd);
if (!gammaMatches) return std::nullopt;
// Now replace subrange
WString omegaApp;
for (const MChar& oc : omega)
Expand All @@ -287,4 +297,43 @@ namespace sca {
}
return std::nullopt;
}
bool SimpleRule::setGamma(lua_State* luaState, const std::string_view& s) {
char* buffer = new char[s.length() + 7];
memcpy(buffer, "return ", 7);
memcpy(buffer + 7, s.data(), s.length());
int stat = luaL_loadbuffer(luaState, buffer, s.length() + 7, "<Γ>");
delete[] buffer;
if (stat != LUA_OK) return false;
gammaref = luaL_ref(luaState, LUA_REGISTRYINDEX);
return true;
}
bool SimpleRule::evaluate(lua_State* luaState,
const WString& word, size_t mstart, size_t mend) const {
if (gammaref == LUA_NOREF) return true;
// Create M
lua_newtable(luaState);
lua_pushinteger(luaState, mstart + 1);
lua_setfield(luaState, -2, "s");
lua_pushinteger(luaState, mend + 1);
lua_setfield(luaState, -2, "e");
lua_pushinteger(luaState, mend - mstart);
lua_setfield(luaState, -2, "n");
lua_setglobal(luaState, "M");
// Create W
lua_newtable(luaState);
for (size_t i = 0; i < word.size(); ++i) {
// pray that no one modifies the word
sca::lua::pushPhonemeSpec(luaState, (PhonemeSpec&) *(word[i]));
lua_seti(luaState, -2, i + 1);
}
lua_setglobal(luaState, "W");
lua_geti(luaState, LUA_REGISTRYINDEX, gammaref);
int stat = lua_pcall(luaState, 0, 1, 0);
if (stat != LUA_OK) {
std::cerr << "Fatal error when evaluating a Γ:\n";
std::cerr << lua_tostring(luaState, -1) << "\n";
abort();
}
return lua_toboolean(luaState, -1);
}
}
28 changes: 28 additions & 0 deletions test/auto/cases/16-lua-gamma.zt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
class X = a;

executeOnce $$
function isPrime(n)
if n == 0 or n == 1 then return false end
for i = 2, math.sqrt(n) do
if n % i == 0 then return false end
end
return true
end
function isSquare(n)
local i = 0
while i * i <= n do
if i * i == n then return true end
i = i + 1
end
return false
end
$$

# W: the word matched (list of phoneme specs)
# M.s: the start index of the match (1-idx)
# M.e: the end index of the match (1-idx, makes half-open interval)
# M.n: the number of chars matched

a+ -> "OKITA-SAN DAISHOURI!" (~ _ ~) $$ isPrime(M.n) $$;
a+ -> "nobunobu" (~ _ ~) $$ isSquare(M.n) $$;
a+ -> "._." (~ _ ~);
30 changes: 30 additions & 0 deletions test/auto/cases/expected-16-lua-gamma.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
a -> nobunobu
aa -> OKITA-SAN DAISHOURI!
aaa -> OKITA-SAN DAISHOURI!
aaaa -> nobunobu
aaaaa -> OKITA-SAN DAISHOURI!
aaaaaa -> ._.
aaaaaaa -> OKITA-SAN DAISHOURI!
aaaaaaaa -> ._.
aaaaaaaaa -> nobunobu
aaaaaaaaaa -> ._.
aaaaaaaaaaa -> OKITA-SAN DAISHOURI!
aaaaaaaaaaaa -> ._.
aaaaaaaaaaaaa -> OKITA-SAN DAISHOURI!
aaaaaaaaaaaaaa -> ._.
aaaaaaaaaaaaaaa -> ._.
aaaaaaaaaaaaaaaa -> nobunobu
aaaaaaaaaaaaaaaaa -> OKITA-SAN DAISHOURI!
aaaaaaaaaaaaaaaaaa -> ._.
aaaaaaaaaaaaaaaaaaa -> OKITA-SAN DAISHOURI!
aaaaaaaaaaaaaaaaaaaa -> ._.
aaaaaaaaaaaaaaaaaaaaa -> ._.
aaaaaaaaaaaaaaaaaaaaaa -> ._.
aaaaaaaaaaaaaaaaaaaaaaa -> OKITA-SAN DAISHOURI!
aaaaaaaaaaaaaaaaaaaaaaaa -> ._.
aaaaaaaaaaaaaaaaaaaaaaaaa -> nobunobu
aaaaaaaaaaaaaaaaaaaaaaaaaa -> ._.
aaaaaaaaaaaaaaaaaaaaaaaaaaa -> ._.
aaaaaaaaaaaaaaaaaaaaaaaaaaaa -> ._.
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa -> OKITA-SAN DAISHOURI!
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -> ._.
30 changes: 30 additions & 0 deletions test/auto/cases/words-16-lua-gamma.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
a
aa
aaa
aaaa
aaaaa
aaaaaa
aaaaaaa
aaaaaaaa
aaaaaaaaa
aaaaaaaaaa
aaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaaa
aaaaaaaaaaaaaa
aaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

0 comments on commit c03c8e3

Please sign in to comment.