-
Notifications
You must be signed in to change notification settings - Fork 1
/
COPY-OF_hyph-bg.tex
7786 lines (7781 loc) · 104 KB
/
COPY-OF_hyph-bg.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% Copy of `http://mirror.ctan.org/language/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-bg.tex`
% `https://t.co/m4rPDuWQVD?amp=1`
% https://www.tug.org/tex-hyphen/
% https://github.com/bramstein/typeset/ (The Knuth hyphenation algorithm in JavaScript)
% NOTE: The original file does not contain these first 8 lines.
% copyright: Copyright (C) 2000, 2004, 2017 Anton Zinoviev
% title: Hyphenation patterns for Bulgarian
% version: 21 October 2017
% language:
% name: Bulgarian
% tag: bg
% notice: >
% This file is part of the hyph-utf8 package.
% See http://www.hyphenation.org/tex for more information.
% authors:
% -
% name: Anton Zinoviev
% contact: anton:lml.bas.bg
% licence:
% text: >
% This software may be used, modified, copied, distributed, and sold,
% both in source and binary form provided that the above copyright
% notice and these terms are retained. The name of the author may not
% be used to endorse or promote products derived from this software
% without prior permission. THIS SOFTWARE IS PROVIDES "AS IS" AND
% ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED. IN NO EVENT
% SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT
% OF THE USE OF THIS SOFTWARE.
% hyphenmins:
% typesetting:
% left: 2
% right: 2
% changes: See below
% ==========================================
% Copyright (C) 2000,2004,2017 by Anton Zinoviev <[email protected]>
%
% This software may be used, modified, copied, distributed, and sold,
% both in source and binary form provided that the above copyright
% notice and these terms are retained. The name of the author may not
% be used to endorse or promote products derived from this software
% without prior permission. THIS SOFTWARE IS PROVIDES "AS IS" AND
% ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED. IN NO EVENT
% SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT
% OF THE USE OF THIS SOFTWARE.
%
% Bulgarian hyphenation patterns
%
% Generated by ./hyph-bg.sh --safe-morphology --standalone-tex
%
% Both left and right hyphenmins should be set to 2.
%
% % Automated Bulgarian Hyphenation
% % Anton Zinoviev
% % 21 October 2017
%
% Principles of the Bulgarian hyphenation
% =======================================
%
% One specificity of the Bulgarian language is that the average length
% of the words is greater than in English. When typesetting a Bulgarian
% text, hyphenation is more important than when typesetting an English
% text. Knuth's algorithm for line-breaking is such that in most
% English paragraphs no hyphenation will be used. With a Bulgarian
% text, however, even the Knuth's algorithm will use hyphenation in most
% paragraphs. Hyphenation becomes an absolute necessity if we want to
% obtain nice, justified paragraphs when using a software with dumb
% line-breaking algorithm, such as LibreOffice.
%
% According to Decree 936 of the Council of Ministers promulgated on 27
% November 1950, the Institute for Bulgarian Language at the Bulgarian
% Academy of Sciences is authorised to publish the rules of the
% orthography of the Bulgarian language (within certain limits).
%
% Hyphenation rules between 1945 and 1983
% ---------------------------------------
%
% Between 1945 and 1983 Bulgarian used syllable hyphenation with two
% morphological exceptions: hyphenation is preferred between a prefix
% and a stem and at the boundary of compound words. The following were
% the rules governing the hyphenation:
%
% 1. One letter does not stay alone. Words of one syllable can not be
% hyphenated.
% 2. No hyphenation before or after ь.
% 3. In a sequence of vowels at least one vowel stays before the
% hyphen.
% 4. A single consonant between two vowels links with the second vowel.
% For example по-ле /po-le/, ра-бо-та /ra-bo-ta/.
% 5. In a sequence of consonants between two vowels, at least one
% consonant stays with the second vowel. For example те-сто /te-sto/
% or тес-то /tes-to/.[^b]
% 6. In a sequence of consonants between two vowels, if the first
% consonant is sonorant (й /y/, л /l/, м /m/, н /n/, р /r/), then it
% stays with the first vowel. For example гер-дан /ger-dan/, сен-ки
% /sen-ki/.
% 7. The hyphenation separates two successive equal consonants. For
% example времен-но /vremen-no/, пролет-та /prolet-ta/.
% 8. When the letters дж /dzh/ and дз /dz/ denote a single consonant,
% then they are not separated. For example боя-джия /boya-dzhiya/
% but not бояд-жия /boyad-zhiya/. When these letters denote two
% consonants, then the normal rules apply: над-живявам
% /nad-zhivyavam/.
% 9. Word prefixes may not be broken. Compound words are hyphenated
% either at the boundary of the components or the hyphenation rules
% are applied to each of the components separately. For example:
% пред-упреждавам /pred-uprezhdavam/ (not пре-дупреждавам
% /pre-duprezhdavam/), пред-известие /pred-izvestie/ (not
% пре-дизвестие /pre-dizvestie/), за-движвам /za-dvizhvam/ (not
% зад-вижвам /zad-vizhvam/), авто-клуб /avto-klub/ (not авток-луб
% /avtok-lub/), вакуум-апарат /vakuum-aparat/ (not вакуу-мапарат
% /vakuu-maparat/).
%
% In some rare cases the proper application of rule 9 depends on the
% semantics of the word. For example пре-дреша /pre-dresha/ 'change
% clothes' but пред-реша /pred-resha/ 'predetermine' or прес-пите
% /pres-pite/ 'the snow-drifts' but пре-спите /pre-spite/ 'sleep for a
% while/overnight'.
%
% [^b]: In several publications this rule is formulated with the
% additional restriction that the sequence of consonants begins with
% an obstruent. I believe this restriction is unintentional. It
% makes no sense to forbid a hyphenation of the form AB-A but to
% permit ABB-A (A denotes a vowel and B – a consonant).
%
% Hyphenation rules between 1983 and 2012
% ---------------------------------------
%
% The Orthographic dictionary published by the Institute for Bulgarian
% language in 1983 introduced new hyphenation rules. The complexity of
% the previous rules was the main reason for the change. The new rules
% aimed at two objectives: simplicity and unambiguity.
%
% The new rules are:
%
% 1. A consonant between two vowels links with the second vowel. For
% example ви-со-чи-на /vi-so-chi-na/.
% 2. In a sequence of two or more consonants between two vowels, at
% least one consonant stays with first vowel and at least one with
% the second vowel. For example сес-тра /ses-tra/ and сест-ра
% /sest-ra/.
% 3. Two equal consonants are separated. For example плен-ник
% /plen-nik/.
% 4. In a sequence of two or more vowels, the first vowel stays before
% the hyphen. For example пре-одолея /pre-odoleya/ and прео-долея
% /preo-doleya/.
% 5. In a sequence of three or more vowels, the last vowel stays after
% the hyphen. For example мао-изъм /mao-izam/ but not маои-зъм
% /maoi-zam/.
% 6. The letter й /y/ between a vowel and a consonant stays with the
% vowel. For example май-ка /may-ka/.
% 7. When a sequence of two or more consonants follows й /y/ then at
% least one consonant links with й /y/. For example айс-берг
% /ays-berg/ (not ай-сберг /ay-sberg/).
% 8. The letter й /y/ between two vowels links with the second vowel.
% For example ма-йор /ma-yor/.
% 9. No hyphenation before or after ь.
% 10. When the letters дж /dzh/ denote a single consonant, then they are
% not separated. For example су-джук /su-dzhuk/ (not суд-жук
% /sud-zhuk/) but над-живея /nad-zhiveya/.
% 11. There must be at least one vowel before and after the hyphen.
% 12. One letter does not stay alone.
%
% The total disregard of the morphology by these rules leads to some
% strange results. For example пре-дизвестие /pre-dizvestie/ is
% permitted and пред-известие /pred-izvestie/ is forbidden, зад-вижвам
% /zad-vizhvam/ is permitted and за-движвам /za-dvizhvam/ is forbidden,
% авток-луб /avtok-lub/ is permitted and авто-клуб /avto-klub/ is
% forbidden, вакуу-мапарат /vakuu-maparat/ is permitted and
% вакуум-апарат /vakuum-aparat/ is forbidden. Because of this, the new
% rules were not universally accepted. The old rules are still
% mentioned in various places in Internet, they are included even in
% some grammar books published by the publishing houses of the Ministry
% of Education and of Sofia University. The software developers,
% however, soon came into love with the new hyphenation rules.
%
% Hyphenation rules after 2012
% ----------------------------
%
% In 2012 new rules came into force. There are two differences with
% respect to the previous rules:
%
% 1. Rule 5 of the previous rules is revoked. For example маои-зъм
% /maoi-zam/ becomes a valid hyphenation.
% 2. The new rules permit morphologically based hyphenation (however it
% is not obligatory). For example пред-известие /pred-izvestie/,
% за-движвам /za-dvizhvam/, авто-клуб /avto-klub/, вакуум-апарат
% /vakuum-aparat/ are valid hyphenations.
%
% Good hyphenation is a complex matter and it seems the linguists at the
% Institute for Bulgarian Language have recognised this. They no longer
% attempt to provide universal rules about everything. Instead, they
% provide some very permissible rules while the good application of
% these rules is leaved to the discretion and the experience of the
% printers and the developers of hyphenation software.
%
% It makes sense to use at least two different sets of hyphenation rules
% for Bulgarian. In most cases a more restrictive version should be
% used, one which attempts to eliminate the controversial cases of
% hyphenation. When typesetting a Bulgarian text in a narrow newspaper
% column, however, it will be appropriate to use more liberal
% hyphenation rules. It should be noted that one of the reasons for the
% hyphenation reform in 1983 was the desire to fix the chaotic
% hyphenation in the Bulgarian newspapers at that time.
%
% Computer implementations
% ========================
%
% Mathematical analysis of the Bulgarian hyphenation
% --------------------------------------------------
%
% The earliest mathematical analysis of the Bulgarian hyphenation rules
% belongs to Veska Noncheva.[^1] In 1988 she proposed a mathematical
% formalisation of the hyphenation rules in a table with 22 rows.[^2]
%
% [^1]: <http://www.researchgate.net/profile/Veska_Noncheva>
%
% [^2]: Нончева В. Алгоритъм за автоматично пренасяне на думи в
% българския език. Математика и математическо
% образование. Сб. доклади на 17. ПК на СМБ. С., БАН, 1988, 479-482.
%
% In the same year Eugene Belogay[^3] proposed an alternative
% formalisation with only 9 rules.[^4] Belogay proved that his rules are
% consistent and that they form a minimal set. The rules of Belogay
% have negative character – every hyphenation which is not forbidden by
% a rule is possible hyphenation.
%
% [^3]: <http://www.linkedin.com/in/belogay>
%
% [^4]: Белогай Е. Алгоритъм за автоматично пренасяне на думи. Компютър
% за вас (1988) 3, 12-14.
%
% The following are the first 7 rules, as formulated by Belogay:
%
% 1. Б-А
% 2. А-ББ
% 3. Б-ТТ, ТТ-Б
% 4. ААА-Б
% 5. й-ББ
% 6. Б-ь
% 7. д-ж
%
% Here А denotes an arbitrary vowel letter, Б denotes an arbitrary
% consonant letter (including ь and й), ТТ denotes a sequence of two
% equal consonant letters and the letters й, ь, д and ж denote
% themselves. For example the rule "Б-А" says that we are not permitted
% to separate a consonant letter from immediately following vowel
% letter.
%
% The eighth rule of Belogay says that hyphenation is forbidden before
% the first and after the last vowel letter. The ninth rule of Belogay
% says that hyphenation is forbidden immediately after the first or
% immediately before the last letter of the word.
%
% Notice that is is very easy to translate the rules of Belogay in the
% form, required for the hyphenation algorithm of Knuth and Liang used
% in TeX.[^a] Let us remind that this algorithm matches the word with a
% set of string patterns in which the odd numbers say hyphenation is
% permitted in this position and even numbers say the hyphenation is
% forbidden. When two patterns give conflicting numbers for the same
% position, then the greater number wins.
%
% First, since the rules of Belogay are negative (they say where
% hyphenation is forbidden, not where it is permitted), we have to
% permit the hyphenation everywhere:
%
% 1. А1
% 2. Б1
%
% Then, the first seven rules of Belogay obtain the form:
%
% 1. Б2А
% 2. А2ББ
% 3. Б2ТТ ТТ2Б
% 4. ААА2Б
% 5. й2ББ
% 6. Б2ь
% 7. д2ж
%
% Since no Bulgarian word starts with more that four consonants and no
% Bulgarian word ends with more than three consonants, the eighth rule
% of Belogay can be translated in the following way:
%
% 1. .Б2
% 2. .ББ2
% 3. .БББ2
% 4. 2Б.
% 5. 2ББ.
%
% The ninth rule of Belogay means that left and right hyphen mins should
% be set to 2.
%
% The work of Eugene Belogay was not limited to merely a mathematical
% analysis of the Bulgarian hyphenation rules. In his paper he
% published a short algorithm in Pascal which implements these rules.
% It didn't take long for this algorithm to be used in various text
% processing software. The algorithm of Belogay was famous for many
% years. Even as late as 1997 in one book about TeX, the author didn't
% care to give any explanations but simply wrote about "the algorithm of
% Belogay" as something well known to the reader.[^5]
%
% [^a]: Liang, Franklin Mark. Word Hy-phen-a-tion by
% Com-put-er (Doctoral Dissertation). Stanford University, 1983
%
% [^5]: Василев В. Ултимативният ТеХ. Удоволствието да правим
% предпечатна подготовка сами. София, Интела, 1997, 36
%
% Bulgarian hyphenation in TeX
% ----------------------------
%
% One unfortunate design decision of Knuth was that the hyphenation
% algorithm of TeX applied the hyphenation patterns not to the input
% character codes but to the internal codes of the glyphs in the font.
% This created a problem for the Cyrillic languages because in TeX the
% Cyrillic fonts did not have standardised encoding. Perhaps this is
% one of the reasons why the earliest implementations of the Bulgarian
% hyphenation in TeX did not rely on the internal hyphenation algorithm
% of TeX. Instead, external tools were used to insert soft hyphens in
% all Bulgarian words. For example such a tool would replace the word
% сричкопренасяне /srichkoprenasyane/ with
% срич\\-коп\\-ре\\-на\\-ся\\-не /srich\\-kop\\-re\\-na\\-sya\\-ne/.
% The saying "To every disadvantage there is a corresponding advantage"
% is true – since Cyrillic and Latin letters use different character
% codes, an external tool could easily insert soft hyphens in all
% Bulgarian words while leaving the TeX commands intact.
%
% The earliest known attempt to use the hyphenation algorithm of TeX for
% Bulgarian was made by Ognyan Tonev in 1990.[^6] He described his work
% as "a not very good translation of the rules. I work in this
% direction. But I don't have a 100% working complect of patterns. So,
% the copy I send to you[^7] is only a beta-version." The hyphenation
% patterns of Tonev don't work correctly and it seems he never completed
% his work.
%
% [^6]: The author of this text was unable to find current information
% about Ognyan Tonev in Internet. Apparently in 1990 he worked in
% the Center of Informatics and Computer Technology of the Bulgarian
% Academy of Sciences.
%
% [^7]: To Yannis Haralambous,
% <http://perso.telecom-bretagne.eu/yannisharalambous>
%
% The first usable Bulgarian hyphenation patterns for TeX were developed
% by Georgi Boshnakov[^8] in 1994. In order to solve the encoding
% problem, Boshnakov had developed TeX fonts supporting the MIK encoding
% (the prevalent encoding at that time in Bulgaria). This allowed him
% to introduce a fully working implementation only a few months after
% LaTeX2e became the official LaTeX version. Later Boshnakov modified
% his work with the Babel system. The hyphenation patterns of Boshnakov
% did their job well enough, so that for almost quarter a century after
% their initial creation, they remained the only Bulgarian hyphenation
% patterns in the standard distributions of TeX and CTAN.
%
% [^8]: <http://www.maths.manchester.ac.uk/~gb/>
%
% There are some similarities between the patterns of Boshnakov and the
% patterns of Belogay. The following are the main differences.
%
% First, Boshnakov used an ingenious and more compact implementation of
% the second and the third rule. Instead of {А2ББ, Б2ТТ, ТТ2Б}, or
% 8×22×22+22×22+22×22=4840 patterns in total, Boshnakov has patterns of
% the form 2Б3Б2 and 4Т3Т4, or only 22×22=484 in total, with the same
% effect.
%
% The second main difference between the patterns of Boshnakov and the
% patterns of Belogay concerns the letter combination дж /dzh/. In
% Bulgarian this letter combination can denote either a single
% consonant, or a sequence of two consonants and the hyphenation rules
% change respectively. Unfortunately, it is impossible to know the
% meaning of дж /dzh/ without a vocabulary. The solution of Belogay was
% a cautious one – his rules do the hyphenation in a way which will be
% correct regardless of whether дж /dzh/ is a single consonant or a
% sequence of two consonant. On the other hand, the approach of
% Boshnakov is a bold one – since дж /dzh/ is more often a single
% consonant, his rules assume that it is always a single consonant. The
% number of the cases when this decision leads to bad hyphenations is
% insignificant in comparison with the cases in which we obtain improved
% hyphenation.
%
% The third main difference between the patterns of Boshnakov and the
% patterns of Belogay concerns the eighth rule – its implementation in
% the rules of Boshnakov is rather limited which leads to wrong
% hyphenations like бри-дж /bri-dzh/. A full implementation of this
% rule would require 11660 patterns in total and this would be too much
% for the computers in 1994.
%
% Later developments
% ------------------
%
% In 1995 Atanas Topalov defended a Masters thesis in the Faculty of
% Mathematics and Informatics at Sofia University titled "Algorithms and
% software about text processing".[^9] One of the main topics in his
% thesis was the Bulgarian hyphenation. Topalov criticised vehemently
% the official hyphenation rules and their total disregard of the
% morphology. He wrote:
%
% > If we look at the history of the problems of the hyphenation, we
% > will discover something very strange. Instead of the expected
% > involvement with the depths and aspiration for more admissible and
% > satisfactory style, we can find a growing tendency for
% > simplification. One unpleasant discovery is that the development of
% > the hyphenation software stays firmly on the principle "let us do
% > the easiest thing". The earliest works which have been studied are
% > from 1978. It turned out that they present the best approach
% > concerning the automated hyphenation. The authors have chosen the
% > most difficult but the most correct (from literary point of view)
% > method for hyphenation, namely the morphological approach.
%
% Topalov proposed his own hyphenation algorithm. The hyphenation it
% generated was smooth and easy to read. One obvious defect of the
% algorithm of Topalov was that it contradicted the official hyphenation
% rules at that time. One can argue, however, that his algorithm is
% compatible with the current hyphenation rules.
%
% [^9]: The thesis of Atanas Topalov can be accessed at the author's
% website <http://www.mind-print.com>
%
% In 1999 Svetla Koeva[^10] wrote a paper about the automated Bulgarian
% hyphenation.[^11] At that time she was a junior member of the
% Department of Computational Linguistics at the Institute for Bulgarian
% Language but now she is a director of the whole institute. The paper
% of Koeva contains a list of hyphenation patterns which can be used as
% a basis of automated hyphenation. In 2004 with the help of Stoyan
% Mihov[^12] the rules of Koeva were formalised with regular relations
% and rewriting rules. They were implemented in a software product
% named ItaEst which provided Bulgarian hyphenation and grammar checking
% for various software products of Microsoft and Apple.
%
% [^10]: <http://dcl.bas.bg/svetla_koeva/>
%
% [^11]: Коева, Светла. Правила за пренасяне на части от думите на нов
% ред. Български език. 1999/2000, 1, 84-86
%
% [^12]: <http://lml.bas.bg/~stoyan/>
%
% The main differences between the hyphenation of Koeva and the official
% hyphenation rules effective after 2012 is that the separation of a
% long sequence of consonants between two vowels is done according to
% the rules valid before 1983. For example се-стра /se-stra/ and
% ай-сберг /ay-sberg/ are permitted. The main difference between the
% hyphenation of Koeva and the official hyphenation rules effective
% before 1983 is that the rules of Koeva disregard the morphology of the
% words. The following rule of Koeva is specific: in a sequence of two
% sonorant consonants between two vowels, we are permitted to separate
% the first vowel from the first consonant, for example материа-лна
% /materia-lna/.
%
% In 2000 Anton Zinoviev[^13] created new hyphenation patterns for TeX.
% He didn't know about the previous work of Boshnakov and he didn't
% bother to make his work available in the various TeX distributions and
% CTAN. His work was used mostly by the local Linux enthusiasts and the
% colleagues of Zinoviev. In 2001 Radostin Radnev[^14] created a free
% grammar dictionary of Bulgarian[^15] where he used the hyphenation
% patterns of Zinoviev. From there the work of Zinoviev propagated to
% OpenOffice, LibreOffice and various online dictionaries, including
% <http://bg.wiktionary.org> and <http://rechnik.chitanka.info>.
%
% [^13]: The author of this text.
%
% [^14]: <http://bg.linkedin.com/in/radostinradnev>
%
% [^15]: <http://bgoffice.sourceforge.net/>
%
% The following are the main differences between the hyphenation of
% Zinoviev and the hyphenation of Boshnakov.
%
% First, the eighth rule of Belogay is fully implemented.
%
% Second, the rules of Zinoviev try to detect when the letters дж /dzh/
% (and дз /dz/) denote a single consonant and when they denote a
% sequence of two consonants. By default, however, Zinoviev (like
% Boshnakov) assumes that дж /dzh/ is a single consonant and hyphenates
% accordingly.
%
% Third, the rules of Zinoviev disable some cases of unpleasant
% hyphenations:
%
% 1. In a consonant sequence like тст /tst/, the two equal consonants т
% /t/ are separated. For example братст-во /bratst-vo/ is forbidden
% while братс-тво /brats-tvo/ and брат-ство /brat-stvo/ are
% permitted.
% 2. The hyphenation is forbidden after a sonorant consonant following
% an obstruent consonant. For example отм-ра /otm-ra/ is forbidden
% and от-мра /ot-mra/ is permitted.
% 3. The hyphenation separates two consecutive kindred voiced/voiceless
% consonants. For example субп-родукт /subp-roduct/ is forbidden and
% суб-продукт /sub-product/ is permitted.
%
% At the start of his work on the Bulgarian hyphenation, Zinoviev had
% the opportunity to discuss the hyphenation with Svetla Koeva. He
% remembers that some cases of unpleasant hyphenation were suggested to
% him by Koeva. Unfortunately, he hasn't taken notes so now he doesn't
% know which cases of unpleasant hyphenation have been suggested to him
% by Koeva and which are his own findings.
%
% The present work
% ================
%
% Motivation
% ----------
%
% The present work was carried out on the initiative of the leader of
% the Bulgarian localisation team of Mozilla, who contacted Zinoviev,
% Boshnakov and the maintainers of the TeX hyphenation patterns.[^17]
% This work pursues the following main objectives:
%
% 1. to update the hyphenation patterns in accordance with the current
% hyphenation rules;
% 2. to generate the hyphenation patterns by a publicly available
% script;
% 3. to make the hyphenation patterns customisable;
% 4. to provide documentation for the future developers.
%
% [^16]: <http://mozillians.org/en-US/u/stoyan/>
%
% [^17]: <http://hyphenation.org>
%
% The current official hyphenating rules for Bulgarian are rather
% liberal. Very often, in a long sequence of consonants we are
% permitted to split the word at any position, for example аген-т-с-т-во
% /agen-t-s-t-vo/. This is prone to many unusual and unexpected results
% that interrupt the attention of the reader or deceive his expectations
% during the movement of his eyes to the next line. On the other hand,
% in order to produce nice justified paragraphs there is no need for so
% many hyphenation possibilities. It would be sufficient even if only
% one possible separation between any two syllables was permitted.
%
% Therefore, it makes sense to use a more restrictive version of the
% Bulgarian hyphenation, one which eliminates the controversial cases of
% hyphenation. Only when typesetting a Bulgarian text in a very narrow
% newspaper column it will be appropriate to use a more liberal version.
% It should be noted that some specialised English dictionaries also
% separate the word-division positions into two categories – preferred
% positions and less recommended positions.
%
% There are two methods to determine the optimal division within a
% sequence of consonants between two vowels:
%
% * we can hyphenate according to the syllables in the word or
% * we can hyphenate morphologically.
%
% Hyphenation according to the syllables in the word
% --------------------------------------------------
%
% Let us look at the properties of the Bulgarian syllables. All
% syllables have the following structure:
%
% > onset - nucleus - code
%
% The nucleus in Bulgarian is always a vowel. Both the onset and the
% code are (possibly empty) sequences of consonants.
%
% The Bulgarian syllables adhere to the Sonority Sequencing Principle.
% According to this principle, the consonants within the onset have
% raising sonority and the consonants within the code have decreasing
% sonority.
%
% Several grammar books agree that the following sonority scale is valid
% for Bulgarian:
%
% > voiceless obtrusive < voiced obtrusive < sonorant consonant < vowel
%
% According to the investigations of the author, the only exception to
% this law is due to the letter в /v/ which is a voiced obtrusive but it
% can be used also as a voiceless obtrusive. This exception is due to a
% spelling particularity of the Bulgarian language. Whenever the letter
% в /v/ seemingly violates the Sonority Sequencing Principle, in the
% spoken language this letter is read as ф /f/, that is as a voiceless
% obtrusive (for example the word отвсякъде /otvsyakade/ is read as
% отфсякъде /otfsyakade/).[^18]
%
% [^18]: No Primitive Slavonic word contains the phoneme ф /f/.
% Therefore, we can safely assume that in the Primitive Slavonic
% language the consonant ф /f/ was a positional variant of the consonant
% в /v/.
%
% The author has found that the sonorant consonants in Bulgarian have
% their own sonority scale:
%
% > м /m/ < н /n/ < л /l/ < р /r/ < й /y/
%
% Only a few words such as жанр /zhanr/ and химн /himn/ violate this
% scale. Such words are always loan-words and their pronunciation is
% somewhat problematic for the native Bulgarian speakers.
%
% In addition to the Sonority Sequencing Principle, the consonant
% clusters within the Bulgarian syllable adhere to the following
% additional principles:
%
% 1. Both in the onset and in the code, the labial and dorsal plosives
% precede the coronal plosives and affricates.
% 2. If the onset or the code contains two plosives or affricates, then
% there are no fricatives between them. Few words with the Latin
% root 'text' are exceptions: контекст /kontekst/.
% 3. If the onset or the code contains two fricatives other than в /v/,
% then there are no plosives or affricates between them.
% 4. If the onset or the code contains two plosives or affricates, then
% they both have equal sonority (both are voiced, or both are
% voiceless).
% 5. If the onset or the code contains two fricatives other than в /v/,
% then they both have equal sonority (both are voiced, or both are
% voiceless).
% 6. Neither the onset, nor the code may contain two labial plosives, or
% two coronal plosives or affricates or two dorsal plosives.
% 7. Neither the onset, nor the code may contain two equal consonants
% with the exception of в /v/ (for example втвърди /vtvardi/).[^19]
%
% [^19]: Actually, the letter в /v/ is not a real exception because in
% all such cases this letter denotes two different consonants – в /v/
% and ф /f/. Only in the Russian loan-word взвод /vzvod/ the two
% letters в /v/ denote a repeating consonant в /v/.
%
% From all these properties of the Bulgarian syllable we can deduce the
% following hyphenation rules:
%
% 1. In a sequence МК where М is a consonant with higher sonority than
% K, we are not permitted to hyphenate before М. Exception: when М
% is в /v/ and К is a voiceless consonant.
% 2. In a sequence КМ where М is a consonant with higher sonority than
% K, we are not permitted to hyphenate after М.
% 3. In a sequence KBT where K and T are plosives or affricates and B is
% fricative, we separate K from T.
% 4. In a sequence CKB where K is a plosive or affricate and C and B are
% fricatives other than в /v/, we separate C from B.
% 5. If in a consonant sequence a coronal plosive or affricate Т is
% followed by a labial or dorsal plosive К, then we separate Т from К.
% 6. If a consonant sequence contains two plosives or affricates, one
% voiced and one voiceless, then we separate them.
% 7. If a consonant sequence contains two fricatives other than в /v/,
% one voiced and one voiceless, then we separate them.
% 8. If a consonant sequence contains two labial plosives or two coronal
% plosives or affricates or two dorsal plosives then they are
% separated.
% 9. If a consonant sequence contains two equal consonants (not
% necessarily consecutive), then they are separated.
%
% With so many prohibitive rules, a question arises: if we apply all
% these rules, aren't we going to eliminate too many hyphenation
% possibilities? The answer is no. It can be demonstrated that between
% any two consecutive syllables at least one separation point will be
% permitted.
%
%
% Hyphenation according to the morphology
% ---------------------------------------
%
% Between 1983 and 2012 the official orthographic rules of the
% Bulgarian language forbade morphologically based hyphenation. After
% 2012 such hyphenation is permitted (but not obligatory).
%
% The most important case when it is very desirable to use
% morphologically based hyphenation is the case of the compound words.
% Divisions such as авток-луб /avtok-lub/ and вакуу-мапарат
% /vakuu-maparat/ are extremely irritating even if they are formally
% correct. Unfortunately, we do not have a vocabulary of the compound
% Bulgarian words that would permit us to produce rules for automated
% hyphenation. Therefore, the current Bulgarian hyphenation patterns do
% not attempt to apply morphological hyphenation to such words.
%
% Second in importance (but far more significant in terms of numbers) is
% the case with the word prefixes. While the eyes of the reader still
% look at the start of the word, the word is still unknown to him. At
% this point, it is very important not to deceive his expectations. For
% example, when the reader sees над- /nad-/ at the end of the line, he
% will expect that this is the prefix над- /nad-/ with semantics 'attain
% more than'. This expectation will be fooled if this wasn't really a
% prefix, but a deceiving (while formally correct) hyphenation of the
% word надремя /nadremya/ 'have dozed enough' where the real prefix is
% not над- /nad-/ but на- /na-/ with semantics 'achieve a state after
% accumulation'. Such hyphenation distracts the reader and makes the
% reading more difficult.
%
% Third in importance is the case with the word suffixes. With respect
% to the hyphenation rules we can divide the suffixes into three
% categories:
%
% 1. Suffixes starting with a vowel, for example -ар /-ar/. It is not
% appropriate to follow the morphology with such suffixes because
% this will contradict the whole hyphenation tradition of the
% Bulgarian language. For example крав-ар /krav-ar/ is unwarranted.
% 2. Suffixes starting with one consonant, for example -ка /-ka/.
% Usually with such suffixes the syllable boundary in the word
% coincides with morpheme boundary so no specific cares are
% necessary, for example кравар-ка /kravar-ka/. The exceptions are
% rare, for example: обек-тната /obek-tnata/ instead of обект-ната
% /obekt-nata/.
% 3. Suffixes starting with more than one consonant (-ски /-ski/, -ство
% /-stvo/). It is possible to use morphological hyphenation rules
% with such suffixes.
%
% Even if it is possible to use morphological hyphenation with the
% suffixes of the third category, it turns out, this is not as useful as
% it is with the case of the prefixes. When the eyes of the reader have
% reached this part of the word, the word is already more or less known
% to the reader. Therefore, at this point the morphological hyphenation
% does not provide any significant advantages in comparison to the
% simpler hyphenation based only on the syllables in the word. Consider
% for example the word геройс-тво /geroys-tvo/ with suffix -ство
% /-stvo/. When the reader sees геройс- /geroys-/ at the end of the
% line this will give him an early clue that the suffix of the word is
% -ство /-stvo/. Such non-morphological hyphenation does not deceive
% the expectations of the reader. On the contrary, it makes the reading
% easier because it gives clues to the reader about what follows on the
% next line.
%
% Because of these considerations, the current Bulgarian hyphenation
% patterns do not attempt to use morphological hyphenation with respect
% to the suffixes of the words. Though it would be useful to implement
% rules about the suffixes of the second cateogory. Hopefully, some
% future version will have such rules.
%
% Occasionally,[^20] a fourth morphological requirement is stated: that
% hyphenation should conform with the boundary between the word and the
% definitive articles -та /-ta/ and -те /-te/ (postfixed in Bulgarian).
% There is no need to pay attention to this rule because it seems to be
% satisfied by its own nature. The author has searched in a dictionary
% with over 860000 Bulgarian words for cases when the hyphenation rules
% would hyphenate badly with respect to the definitive article. He was
% unable to find even one such case with the hyphenation rules valid
% after 1983 and only about 10 cases with the rules valid before 1983
% (one of them is живопи-ста /zhivopi-sta/ instead of живопис-та
% /zhivopis-ta/).
%
% One unavoidable characteristic of any morphologically based automated
% hyphenation is that it can create wrong hyphenations. Because of
% this, one useful option is to use the morphology in a safe way – to
% use it in order to forbid bad hyphenations but to create no new
% hyphenation possibilities solely on the basis of the morphology.
%
% Take for example the word дозрея /dozreya/ 'ripen fully'. According
% to the phonological rules, we should hyphenate it as доз-рея
% /doz-reya/. According to the morphology, however, we should hyphenate
% as до-зрея /do-zreyq/ because this word is formed with the prefix до-
% /do-/ with semantics 'complete or supplement' and this semantics would
% be lost if the reader sees доз- /doz-/ at the end of the line.
% Therefore, there are three methods to hyphenate this word:
%
% 1. доз-рея /doz-reya/ when morphology is not used;
% 2. до-зрея /do-zreya/ when morphology is fully used;
% 3. дозрея /dozreya/ (no hyphenation) when morphology is used in a safe
% way.
%
% The option to use the morphology in a safe way is very attractive when
% the software uses a smart line-breaking algorithm which can produce
% good results even with less hyphenation possibilities. TeX is one
% such software. It should be noted that this option does not eliminate
% too many hyphenation possibilities because the morpheme boundaries
% most of the time are also syllable boundaries.
%
% [^20]: Правописен и правоговорен наръчник. Състав. Иван Хаджов,
% Цв. Минков; Ред. Ив. Хаджов и др. София, Бълг. кн., 1945
%
% The following are results of a statistics about the quality of the
% morphological rules (the number after the sign ± is the expected
% standard deviation of our estimations):
%
% With the option `--morphology`:
%
% * in 0.1% ±0.3% of the dictionary words the morphological patterns
% create very wrong hyphenation;
% * in 89.8% ±0.1% of the dictionary words the morphological patterns
% hyphenate identically with the case when no morphology patterns are
% used;
% * in 0.3% ±0.2% of the dictionary words the morphological patterns
% hyphenate differently in comparison to the case when no morphology
% patterns are used and the word is hyphenated in a way which
% contradicts the morphology;
% * in 0.6% ±0.1% of the dictionary words the morphological patterns
% hyphenate differently in comparison to the case when no morphology
% patterns are used and there is a possible hyphenation which is
% compatible with the word morphology but which is nevertheless
% forbidden by the morphology patterns.
%
% With the option `--safe-morphology`:
%
% * in 0% of the dictionary words the morphological patterns create very
% wrong hyphenation;
% * in 90.0% ±0.1% of the dictionary words the morphological patterns
% hyphenate identically with the case when no morphology patterns are
% used;
% * in 0.3% ±0.2% of the dictionary words the morphological patterns
% hyphenate differently in comparison to the case when no morphology
% patterns are used and the word is hyphenated in a way which
% contradicts the morphology;
% * in 0.6% ±0.1% of the dictionary words the morphological patterns
% hyphenate differently in comparison to the case when no morphology
% patterns are used and there is a possible hyphenation which is
% compatible both with the word morphology and with the syllable
% boundaries but which is nevertheless forbidden by the morphology
% patterns.
%
% Notice that the morphological patterns create a different hyphenation
% only in about 10% of the words. The following explanation can be
% given for this surprising fact. First, the natural evolution of the
% human languages tends to simplify the complex sequences of consonants.
% Therefore, no morpheme contains a complex sequence of consonants. And
% second, the Bulgarian orthography is morphological. This means that
% the morphemes are written according to their actual pronunciation,
% however the simplifications in the spoken languages which take place
% at the morpheme boundaries are not taken into account in the
% orthography. The independent operation of these two factors leads to
% the result that most of the time the morpheme boundaries coincide with
% the conventional syllable boundaries. The main exception to this is
% when a morpheme starts with a vowel, in this case its syllable will
% include one or more consonants of the preceeding morpheme. The second
% exception is when a morpheme ends with a vowel and the next morpheme
% starts with a sequence of two or more consonants.
%
% Usage of the script `hyph-bg.sh`
% --------------------------------
%
% The `hyph-bg.sh` is all-in-one script which can generate both
% documentation (this text) and Bulgarian hyphenation patterns. When
% given the option `--help` the script gives short usage instructions:
%
% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
% hyph-bg.sh --help
% Show this info
% hyph-bg.sh [--doc-html | --doc-latex | --doc-txt]
% Print documentation in various formats
% hyph-bg.sh [other options]
% Generate Bulgarian hyphenation patterns
%
% Options when generating hyphenation patterns:
%
% --standalone-tex
% Produce hyphenation patterns for TeX with \patterns{ ... }.
%
% --no-hyphen-mins
% Hyphenation patterns which do not require hyphen mins.
% Otherwise: both left and right hyphen mins should be set to 2.
%
% --safe-dz
% Do not try to guess whether DZ is a single consonant or not.
% Only use hyphenation which will be correct in both cases.
%
% --permissible
% Permit any formally correct hyphenation, including unnatural
% divisions, such as studen-tstvo. Useful for educational tools
% or when typesetting Bulgarian text in a very short column.
%
% --morphology
% Apply morphology when hyphenating, for example: za-dvizhvam.
% May hyphenate incorrectly in some cases.
%
% --safe-morphology
% Apply morphology when hyphenating. Never hyphenates incorrectly
% but may prohibit some correct hyphenations.
%
% --no-morphology
% Disregard the morphology. Default.
%
% --1945
% Hyphenate according to the rules effective between 1945 and 1982
%
% --1983
% Hyphenate according to the rules effective between 1983 and 2011
%
% --2012
% Hyphenate according to the rules effective after 2012. Default.
% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
%
% The following are the recommended ways to generate hyphenation
% patterns by this script:
%
% `hyph-bg.sh --standalone-tex --safe-morphology`
% : For TeX. Apply the morphology in a safe way when the software
% uses a smart line-breaking algorithm.
%
% `hyph-bg.sh`
% : For most other software.
%
% `hyph-bg.sh --no-hyphen-mins`
% : The current versions of Mozilla (as of 2017) seem to ignore the
% hyphen mins in words that contain a dash.
%
% `hyph-bg.sh --morphology`
% : For professional typography with human proof-reader.
%
% `hyph-bg.sh --permissible`
% : For educational tools and online dictionaries which can show only one
% kind of hyphenation.
%
% Notice that some specialised English dictionaries separate the
% word-division positions into two categories – preferred positions and
% less recommended positions. It would be best if the Bulgarian online
% dictionaries could do the same. For example hyphen "-" can be used to
% display the preferred positions and dot "." – the less recommended
% positions. If a word-division position is permitted only by the
% patterns of `hyph-bg.sh --permissible`, then this position is less
% recommended.
%
\message{Bulgarian hyphenation patterns (options: --safe-morphology --standalone-tex, version 21 October 2017)}
\patterns{%
.антиа4
.антиб4
.антив4
.антиг4
.антид4
.антие4
.антиж4
.антиз4
.антии4
.антий4
.антик4
.антил4
.антим4
.антин4
.антио4
.антип4
.антир4
.антис4
.антит4
.антиу4
.антиф4
.антих4
.антиц4
.антиш4
.антищ4
.антиъ4
.антию4
.антия4
.бб8
.бв8
.бг8
.бд8
.бж8
.бз8
.бк8
.бл8
.бм8
.бн8
.бп8
.бр8
.бс8
.бт8
.бф8
.бх8
.бц8
.бч8
.бш8
.бщ8
.вб8
.вбб8
.вбв8
.вбг8
.вбд8
.вбж8
.вбз8
.вбк8
.вбл8
.вбм8
.вбн8
.вбп8
.вбр8
.вбс8
.вбт8
.вбф8
.вбх8
.вбц8
.вбч8
.вбш8
.вбщ8
.вв8
.ввб8
.ввв8
.ввг8
.ввд8
.ввж8
.ввз8
.ввк8
.ввл8
.ввм8
.ввн8
.ввп8
.ввр8
.ввс8
.ввт8
.ввф8
.ввх8
.ввц8
.ввч8
.ввш8
.ввщ8
.вг8
.вгб8
.вгв8
.вгг8
.вгд8
.вгж8
.вгз8
.вгк8
.вгл8
.вгм8
.вгн8
.вгп8
.вгр8