-
Notifications
You must be signed in to change notification settings - Fork 7
/
NER eksperimenti 2013.txt
102 lines (87 loc) · 4.15 KB
/
NER eksperimenti 2013.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
1. Baseline - 2012. gada LNB tageris uz LETA kategorijām ar vecajiem gazetieriem un settingiem. (piem, sloppy gazette)
Entity P R F1 TP FP FN
location 0.2667 0.8571 0.4068 12 33 2
organization 0.1579 0.0714 0.0984 3 16 39
person 0.9565 0.5641 0.7097 22 1 17
profession 0.5000 0.0256 0.0488 1 1 38
sum 0.9091 0.8333 0.8696 20 2 4
time 1.0000 0.5313 0.6939 17 0 15
Totals 0.5859 0.3623 0.4478 75 53 132
2. Bez gazetiera
Entity P R F1 TP FP FN
location 0.2400 0.8571 0.3750 12 38 2
organization 0.1500 0.0714 0.0968 3 17 39
person 0.9500 0.4872 0.6441 19 1 20
profession 0.5000 0.0256 0.0488 1 1 38
sum 0.9091 0.8333 0.8696 20 2 4
time 1.0000 0.5313 0.6939 17 0 15
Totals 0.5496 0.3478 0.4260 72 59 135
3. Clean gazetieris, vārdi/uzvārdi
Entity P R F1 TP FP FN
location 0.2609 0.8571 0.4000 12 34 2
organization 0.1500 0.0714 0.0968 3 17 39
person 0.9583 0.5897 0.7302 23 1 16
profession 0.5000 0.0256 0.0488 1 1 38
sum 0.9091 0.8333 0.8696 20 2 4
time 1.0000 0.5000 0.6667 16 0 16
Totals 0.5769 0.3623 0.4451 75 55 132
4. Sloppy gazetieris, vārdi/uzvārdi (ņems vērā locījumus...)
Entity P R F1 TP FP FN
location 0.2500 0.8571 0.3871 12 36 2
organization 0.1500 0.0714 0.0968 3 17 39
person 0.9545 0.5385 0.6885 21 1 18
profession 0.5000 0.0256 0.0488 1 1 38
sum 0.9091 0.8333 0.8696 20 2 4
time 1.0000 0.5000 0.6667 16 0 16
Totals 0.5615 0.3527 0.4332 73 57 134
saks! nav gaidītais. Varbūt onomastikas uzvārdi ir ļaunuma sakne?
5. sloppy gazetieris, lemmatizēti manuāli pielikti testakopas un treniņkopas organizāciju vārdi (PP_orgnames_LETA.txt) - simulējam ideālu gazetiera saturu
Entity P R F1 TP FP FN
location 0.3000 0.8571 0.4444 12 28 2
organization 0.4865 0.4286 0.4557 18 19 24
person 0.9500 0.4872 0.6441 19 1 20
profession 0.6667 0.0513 0.0952 2 1 37
sum 0.9091 0.8333 0.8696 20 2 4
time 1.0000 0.5313 0.6939 17 0 15
Totals 0.6331 0.4251 0.5087 88 51 119
Labāk bet ne tuvu ne labi !
6. clean gazetieris ar to pašu
Entity P R F1 TP FP FN
location 0.2400 0.8571 0.3750 12 38 2
organization 0.1579 0.0714 0.0984 3 16 39
person 0.9500 0.4872 0.6441 19 1 20
profession 0.5000 0.0256 0.0488 1 1 38
sum 0.9091 0.8333 0.8696 20 2 4
time 1.0000 0.5313 0.6939 17 0 15
Totals 0.5538 0.3478 0.4273 72 58 135
useless dēļ lemmatizācijas
7. clean gazetieris ar tiem pašiem datiem, bet kodā pielikta lemmatizācija (pašam tekstam. gazetierī rēķinās ka būs ok..)
Entity P R F1 TP FP FN
location 0.2553 0.8571 0.3934 12 35 2
organization 0.3600 0.2143 0.2687 9 16 33
person 0.9500 0.4872 0.6441 19 1 20
profession 0.6667 0.0513 0.0952 2 1 37
sum 0.9091 0.8333 0.8696 20 2 4
time 1.0000 0.5313 0.6939 17 0 15
Totals 0.5896 0.3816 0.4633 79 55 128
Kautkas nepatīk, par mazu svars tam gazetierim sanāk?? un naher sloppy strādā labāk par šito??
8. gan clean, gan sloppy gazette
Entity P R F1 TP FP FN
location 0.3077 0.8571 0.4528 12 27 2
organization 0.4865 0.4286 0.4557 18 19 24
person 0.9500 0.4872 0.6441 19 1 20
profession 0.6667 0.0513 0.0952 2 1 37
sum 0.9091 0.8333 0.8696 20 2 4
time 1.0000 0.5313 0.6939 17 0 15
Totals 0.6377 0.4251 0.5101 88 50 119
Ņihera nesaprotu - idenisks sloppy gazetei..
9. gan clean, gan sloppy gazete, nodalot tās fīčas vienu no otras
Entity P R F1 TP FP FN
location 0.3000 0.8571 0.4444 12 28 2
organization 0.5676 0.5000 0.5316 21 16 21
person 0.9500 0.4872 0.6441 19 1 20
profession 0.6667 0.0513 0.0952 2 1 37
sum 0.9091 0.8333 0.8696 20 2 4
time 1.0000 0.5313 0.6939 17 0 15
Totals 0.6547 0.4396 0.5260 91 48 116
Ir labāk, bet vienalga organizācijām prasītos būtiski labāk ja ir tas ideālais gazetieris !!