05 Apr 07:43

eubinecto

Skipbigram (`sbg`) is now the default scorer Latest

Latest

What has been changed?

gpt2 is now an optional scorer - transformers and torch are not required dependencies. If you want to use this scorer, install the optional dependencies with pip3 install "politely[gpt2]".
Instead, a new, light-weight, still as accurate lm scorer is added: Styler(model="sbg"). This is now the default scorer for Styler. ( Implementation for this was super simple, much thanks to @bab2min 's hard work put into the kiwipiepy library. )

Contributors

bab2min

Assets 2

20 May 23:49

eubinecto

Beyond heuristics: added `GPT2Scorer`

As-is: `HeuristicScorer` does not take contexts into account

from politely.modeling_heuristic_scorer import HeuristicScorer
styler = Styler(scorer=HeuristicScorer())
print("##### lm을 쓰지 않는 경우 맥락 고려 X ######")
print(styler("내일 저랑 같이 점심 먹어요.", 0))

##### lm을 쓰지 않는 경우 맥락 고려 X ######
내일 나랑 같이 점심 먹어.

To-be: `GPT2Scorer` takes context into account

from politely.modeling_gpt2_scorer import GPT2Scorer
styler = Styler(scorer=GPT2Scorer())  # uses GPT2Scorer by default
print("##### lm을 쓰는 경우 맥락 고려 O ######")
print(styler("내일 저랑 같이 점심 먹어요.", 0))

##### lm을 쓰는 경우 맥락 고려 O ######
내일 나랑 같이 점심 먹자.  # 권유가 아닌 청유이므로 이게 맞음

Assets 2

10 Mar 08:25

eubinecto

`add_rules` of your own

4️⃣ `add_rules` of your own

you can add your own rules with add_rules method:

styler.add_rules(
    {"이🏷VCP🔗(?P<MASK>다🏷EF)": (
        {"다🏷EF"},
        {"에요🏷EF"},  # 에요.
        {"습니다🏷EF"},
    )
    })
sent = "한글은 한국의 글자이다."
print(styler(sent, 1))

한글은 한국의 글자에요.

You can add multiple rules altogether too. Use politely.SELF to refer to the original word.

from politely import SELF
styler.add_rules(
    {
        r"(?P<MASK>(아빠|아버지|아버님)🏷NNG)": (
            {f"아빠🏷NNG"},
            {f"아버지🏷NNG", f"아버님🏷NNG"},
            {f"아버지🏷NNG", f"아버님🏷NNG"}
        ),
        r"(아빠|아버지|아버님)🏷NNG🔗(?P<MASK>\S+?🏷JKS)": (
            {SELF},  #  no change, replace with the original
            {f"께서🏷JKS"},
            {f"께서🏷JKS"}
        ),
        r"(?P<MASK>ᆫ다🏷EF)": (
            {SELF},  # no change, replace with the original
            {"시🏷EP🔗어요🏷EF"},
            {"시🏷EP🔗습니다🏷EF"},
        )
    }
)
sent = "아빠가 정실에 들어간다."
print(styler(sent, 1))
from pprint import pprint
pprint(styler.logs['guess']['out'])  # you can look up the candidates from here

아버지께서 정실에 들어가셔요.
[(['아버지🏷NNG', '께서🏷JKS', '정실🏷NNG', '에🏷JKB', '들어가🏷VV', '시🏷EP', '어요🏷EF', '.🏷SF'],
  0.0125),
 (['아버님🏷NNG', '께서🏷JKS', '정실🏷NNG', '에🏷JKB', '들어가🏷VV', '시🏷EP', '어요🏷EF', '.🏷SF'],
  0.0125)]

Assets 2

09 Mar 11:28

eubinecto

Writing rules programmatically

as-is

나${SEP}NP:
   1: 나${SEP}NP
   2: 저${SEP}NP
   3: 저${SEP}NP
 내${SEP}NP:
   1: 내${SEP}NP
   2: 제${SEP}NP
   3: 제${SEP}NP
 제${SEP}NP:
   1: 내${SEP}NP
   2: 제${SEP}NP
   3: 제${SEP}NP
 너${SEP}NP:
   1: 너${SEP}NP
   2: 당신${SEP}NP
   3: 당신${SEP}NP

to-be

# --- 나/저 --- #
 RULES.update(
     {
         rf"(?P<{MASK}>(나|저){TAG}NP)": (
             {f"나{TAG}NP"},
             {f"저{TAG}NP"},
             {f"저{TAG}NP"}
         )
     }
 )


 # --- 너/당신 --- #
 RULES.update(
     {
         rf"(?P<{MASK}>(너|당신){TAG}NP)": (
             {f"너{TAG}NP"},
             {f"당신{TAG}NP"},
             {f"당신{TAG}NP"}
         )
     }
 )

Full Changelog: v3.2.1...v3.2.2

Assets 2

07 Jul 15:10

eubinecto

First stable version of politely

What's Changed

Tested Styler against noisy data. __call__ now accepts a list of sentences rather than a text - this is to delegate sentence splitting to the user (e.g. use kss or kiwi). Styler does not raise any custom exceptions by default; it raises them only if the user sets debug=True on initialisation.

Issue 86 by @eubinecto in #87
[#88] project management with hatch. Makefile added. by @eubinecto in #89
[#91] Translate step added. Conjugate step added. Using verbs rather … by @eubinecto in #92
[#93] debug parameter added to __call__. Default behaviour of `_… by @eubinecto in #94
Issue 95 by @eubinecto in #96
[#97] refactoring preprocess - now it supports multiple sentences by @eubinecto in #98
Issue 90 by @eubinecto in #99

Full Changelog: v2.6.2...v3.1.0

Contributors

eubinecto

Assets 2