-
Notifications
You must be signed in to change notification settings - Fork 20
/
unicode.1
249 lines (169 loc) · 4.32 KB
/
unicode.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
.\" Hey, EMACS: -*- nroff -*-
.TH UNICODE 1 "2003-01-31"
.SH NAME
unicode \- command line unicode database query tool
.SH SYNOPSIS
.B unicode
.RI [ options ]
string
.SH DESCRIPTION
This manual page documents the
.B unicode
command.
.PP
\fBunicode\fP is a command line unicode database query tool.
.SH OPTIONS
.TP
.B \-h
.B \-\-help
Show help and exit.
.TP
.B \-x
.B \-\-hexadecimal
Assume
.I string
to be a hexadecimal number
.TP
.B \-d
.B \-\-decimal
Assume
.I string
to be a decimal number
.TP
.B \-o
.B \-\-octal
Assume
.I string
to be an octal number
.TP
.B \-b
.B \-\-binary
Assume
.I string
to be a binary number
.TP
.B \-r
.B \-\-regexp
Assume
.I string
to be a Python regular expression
.TP
.B \-s
.B \-\-string
Assume
.I string
to be a sequence of characters
.TP
.B \-a
.B \-\-auto
Try to guess type of
.I string
from one of the above (default)
.TP
.BI \-f FILE
.BI \-\-input_file= FILE
Read characters from FILE and display information about each of them.
Use \- to read from standard input.
.TP
.BI \-m MAXCOUNT
.BI \-\-max= MAXCOUNT
Maximal number of codepoints to display, default: 20; use 0 for unlimited
.TP
.BI \-i CHARSET
.BI \-\-io= IOCHARSET
I/O character set. For maximal pleasure, run \fBunicode\fP on UTF-8
capable terminal and specify IOCHARSET to be UTF-8. \fBunicode\fP
tries to guess this value from your locale, so with properly set up
locale, you should not need to specify it.
.TP
.BI \-\-fcp= CHARSET
.BI \-\-fromcp= CHARSET
Convert numerical arguments from this encoding, default: no conversion.
Multibyte encodings are supported. This is ignored for non-numerical
arguments.
.TP
.BI \-c ADDCHARSET
.BI \-\-charset\-add= ADDCHARSET
Show hexadecimal reprezentation of displayed characters in this additional charset.
.TP
.BI \-C USE_COLOUR
.BI \-\-colour= USE_COLOUR
USE_COLOUR is one of
.B on
.B off
.B auto
.B \-\-colour=on
will use ANSI colour codes to colourise the output
.B \-\-colour=off
won't use colours.
.B \-\-colour=auto
will test if standard output is a tty, and use colours only when it is.
.B \-\-color
is a synonym of
.B \-\-colour
.TP
.B \-v
.B \-\-verbose
Be more verbose about displayed characters, e.g. display Unihan information, if available.
.TP
.B \-w
.B \-\-wikipedia
Spawn browser pointing to English Wikipedia entry about the character.
.TP
.B \-\-wt
.B \-\-wiktionary
Spawn browser pointing to English Wiktionary entry about the character.
.TP
.B \-\-brief
Display character information in brief format
.TP
.BI \-\-format= fmt
Use your own format for character information display. See the README for details.
.TP
.B \-\-list
List (approximately) all known encodings.
.TP
.B \-\-download
Try to download UnicodeData.txt into ~/.unicode/
.TP
.B \-\-ascii
Display ASCII table
.TP
.B \-\-brexit\-ascii
.B \-\-brexit
Display ASCII table (EU–UK Trade and Cooperation Agreement 2020 version)
.SH USAGE
\fBunicode\fP tries to guess the type of an argument. In particular,
if the arguments looks like a valid hexadecimal representation of a
Unicode codepoint, it will be considered to be such. Using
\fBunicode\fP face
will display information about U+FACE CJK COMPATIBILITY IDEOGRAPH-FACE,
and it will not search for 'face' in character descriptions \- for the latter,
use:
\fBunicode\fP \-r face
For example, you can use any of the following to display information
about U+00E1 LATIN SMALL LETTER A WITH ACUTE (\('a):
\fBunicode\fP 00E1
\fBunicode\fP U+00E1
\fBunicode\fP \('a
\fBunicode\fP 'latin small letter a with acute'
You can specify a range of characters as argumets, \fBunicode\fP will
show these characters in nice tabular format, aligned to 256-byte boundaries.
Use two dots ".." to indicate the range, e.g.
\fBunicode\fP 0450..0520
will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)
\fBunicode\fP 0400..
will display just characters from U+0400 up to U+04FF
Use \-\-fromcp to query codepoints from other encodings:
\fBunicode\fP \-\-fromcp cp1250 \-d 200
Multibyte encodings are supported:
\fBunicode\fP \-\-fromcp big5 \-x aff3
and multi-char strings are supported, too:
\fBunicode\fP \-\-fromcp utf-8 \-x c599c3adc5a5
.SH BUGS
Tabular format does not deal well with full-width, combining, control
and RTL characters.
.SH SEE ALSO
ascii(1)
.SH AUTHOR
Radovan Garab\('ik <garabik @ kassiopeia.juls.savba.sk>