Releases: eubinecto/politely
Releases · eubinecto/politely
Skipbigram (`sbg`) is now the default scorer
What has been changed?
gpt2
is now an optional scorer -transformers
andtorch
are not required dependencies. If you want to use this scorer, install the optional dependencies withpip3 install "politely[gpt2]"
.- Instead, a new, light-weight, still as accurate lm scorer is added:
Styler(model="sbg")
. This is now the default scorer forStyler
. ( Implementation for this was super simple, much thanks to @bab2min 's hard work put into thekiwipiepy
library. )
Beyond heuristics: added `GPT2Scorer`
As-is: HeuristicScorer
does not take contexts into account
from politely.modeling_heuristic_scorer import HeuristicScorer
styler = Styler(scorer=HeuristicScorer())
print("##### lm을 쓰지 않는 경우 맥락 고려 X ######")
print(styler("내일 저랑 같이 점심 먹어요.", 0))
##### lm을 쓰지 않는 경우 맥락 고려 X ######
내일 나랑 같이 점심 먹어.
To-be: GPT2Scorer
takes context into account
from politely.modeling_gpt2_scorer import GPT2Scorer
styler = Styler(scorer=GPT2Scorer()) # uses GPT2Scorer by default
print("##### lm을 쓰는 경우 맥락 고려 O ######")
print(styler("내일 저랑 같이 점심 먹어요.", 0))
##### lm을 쓰는 경우 맥락 고려 O ######
내일 나랑 같이 점심 먹자. # 권유가 아닌 청유이므로 이게 맞음
`add_rules` of your own
4️⃣ add_rules
of your own
you can add your own rules with add_rules
method:
styler.add_rules(
{"이🏷VCP🔗(?P<MASK>다🏷EF)": (
{"다🏷EF"},
{"에요🏷EF"}, # 에요.
{"습니다🏷EF"},
)
})
sent = "한글은 한국의 글자이다."
print(styler(sent, 1))
한글은 한국의 글자에요.
You can add multiple rules altogether too. Use politely.SELF
to refer to the original word.
from politely import SELF
styler.add_rules(
{
r"(?P<MASK>(아빠|아버지|아버님)🏷NNG)": (
{f"아빠🏷NNG"},
{f"아버지🏷NNG", f"아버님🏷NNG"},
{f"아버지🏷NNG", f"아버님🏷NNG"}
),
r"(아빠|아버지|아버님)🏷NNG🔗(?P<MASK>\S+?🏷JKS)": (
{SELF}, # no change, replace with the original
{f"께서🏷JKS"},
{f"께서🏷JKS"}
),
r"(?P<MASK>ᆫ다🏷EF)": (
{SELF}, # no change, replace with the original
{"시🏷EP🔗어요🏷EF"},
{"시🏷EP🔗습니다🏷EF"},
)
}
)
sent = "아빠가 정실에 들어간다."
print(styler(sent, 1))
from pprint import pprint
pprint(styler.logs['guess']['out']) # you can look up the candidates from here
아버지께서 정실에 들어가셔요.
[(['아버지🏷NNG', '께서🏷JKS', '정실🏷NNG', '에🏷JKB', '들어가🏷VV', '시🏷EP', '어요🏷EF', '.🏷SF'],
0.0125),
(['아버님🏷NNG', '께서🏷JKS', '정실🏷NNG', '에🏷JKB', '들어가🏷VV', '시🏷EP', '어요🏷EF', '.🏷SF'],
0.0125)]
Writing rules programmatically
as-is
나${SEP}NP:
1: 나${SEP}NP
2: 저${SEP}NP
3: 저${SEP}NP
내${SEP}NP:
1: 내${SEP}NP
2: 제${SEP}NP
3: 제${SEP}NP
제${SEP}NP:
1: 내${SEP}NP
2: 제${SEP}NP
3: 제${SEP}NP
너${SEP}NP:
1: 너${SEP}NP
2: 당신${SEP}NP
3: 당신${SEP}NP
to-be
# --- 나/저 --- #
RULES.update(
{
rf"(?P<{MASK}>(나|저){TAG}NP)": (
{f"나{TAG}NP"},
{f"저{TAG}NP"},
{f"저{TAG}NP"}
)
}
)
# --- 너/당신 --- #
RULES.update(
{
rf"(?P<{MASK}>(너|당신){TAG}NP)": (
{f"너{TAG}NP"},
{f"당신{TAG}NP"},
{f"당신{TAG}NP"}
)
}
)
Full Changelog: v3.2.1...v3.2.2
First stable version of politely
What's Changed
Tested
Styler
against noisy data.__call__
now accepts a list of sentences rather than a text - this is to delegate sentence splitting to the user (e.g. usekss
orkiwi
).Styler
does not raise any custom exceptions by default; it raises them only if the user setsdebug=True
on initialisation.
- Issue 86 by @eubinecto in #87
- [#88] project management with
hatch
. Makefile added. by @eubinecto in #89 - [#91] Translate step added. Conjugate step added. Using verbs rather … by @eubinecto in #92
- [#93]
debug
parameter added to__call__
. Default behaviour of `_… by @eubinecto in #94 - Issue 95 by @eubinecto in #96
- [#97] refactoring
preprocess
- now it supports multiple sentences by @eubinecto in #98 - Issue 90 by @eubinecto in #99
Full Changelog: v2.6.2...v3.1.0