-
Notifications
You must be signed in to change notification settings - Fork 5
/
help.html
187 lines (167 loc) · 9.03 KB
/
help.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
<h2><font color="darkgreen">DiffPDF</font></h2>
<ul>
<li><a href="#bas">Basic Usage</a></li>
<li><a href="#cmp">The Compare Button</a></li>
<li><a href="#txt">Words Comparison Mode</a></li>
<li><a href="#chr">Characters Comparison Mode</a></li>
<li><a href="#vis">Appearance Comparison Mode</a></li>
<li><a href="#zon">Zoning</a></li>
<li><a href="#ran">Page Ranges</a></li>
<li><a href="#mar">Margins</a></li>
<li><a href="#sav">Saving</a></li>
<li><a href="#opt">The Options Dialog</a></li>
<li><a href="#dock">Dock Windows</a></li>
<li><a href="#cli">Command Line Usage</a></li>
</ul>
<h3 id="bas">Basic Usage</h3>
<p>Click the <b>File #1</b> button to choose one PDF file and then
the <b>File #2</b> button to choose another (ideally very similar) PDF
file, then click the <b>Compare</b> button to perform the comparison,
and when that's finished, navigate through the pairs of differing pages
using the <b>View</b> combobox or using the <b>Previous</b> and
<b>Next</b> buttons. Alternatively, drag two files—either
separately or together—and drop them onto <font
color="darkgreen">DiffPDF</font>'s view panels, then click the
<b>Compare</b> button.
<h3 id="cmp">The Compare Button</h3>
<p>When the <b>Compare</b> button is pressed, <font
color="darkgreen">DiffPDF</font> does a high-speed scan of every pair of
pages (~100 pairs of pages per second on the author's machine). To make
the scan as fast as possible <font color="darkgreen">DiffPDF</font> does
a very rough check of each pair of pages—so it is possible that it
identifies some false positives (i.e., page pairs that are really the
same). False positives are quite rare. (There are no false
negatives—differences are never missed.)
<h3 id="txt">Words Comparison Mode</h3>
<p>The default comparison mode is Words which does a smart text
comparison word by word for each pair of pages. This mode is
fairly liberal regarding
whitespace and tries to ignore layout changes (within a page) insofar
as possible. It also treats all hyphens (soft-hyphen, minus sign, etc.),
the same, that is, as a plain hyphen.
This mode is best for alphabetic languages like English.
<h3 id="chr">Characters Comparison Mode</h3>
<p>The Characters comparison mode does a smart text
comparison character by character for each pair of pages. This mode is
liberal regarding whitespace at the ends of lines and tries to
ignore layout changes (within a page) insofar as possible.
It also treats all hyphens (soft-hyphen, minus sign, etc.),
the same, that is, as a plain hyphen.
This mode is best for logographic languages like Chinese and Japanese.
<h3 id="vis">Appearance Comparison Mode</h3>
<p>The Appearance comparison mode
can be used to detect changes in fonts, diagrams, or any other visual
aspects. This mode is absolutely strict and compares each pair of
pages pixel
for pixel. By default this mode shows differences using highlighting
just like the Words and Characters modes do. However, it is also
possible to compare using
composition modes which can be useful to detect very small and subtle
differences that aren't immediately apparent.
<h3 id="zon">Zoning</h3>
<p>Zoning is an experimental feature designed to produce more accurate
results (i.e., fewer false positives). Its main use is for pages that
have tables or that mix alphabetic and logographic text, since these can
cause the underlying popplar PDF library to provide the page's words
mixed up. <font color="red">Warning:</font> using zoning for large
complex pages (bigger than A4, multiple columns, tables) in Characters
mode can be very slow. (The current focus for the zoning code is
functionality not efficiency.) Furthermore, in some cases zoning can
cause an <i>increase</i> in false positives—this can occur because
the zoning code reorders the text that is fed to the sequence matcher
and sometimes the reordering is wrong. Getting this right is
non-trivial; changing the tolerances may help.
<p>The Tolerance/R value is the maximum distance between text (i.e., word)
rectangles for the rectangles to be placed in the same zone. Lower
values create more zones; higher values create fewer zones. More
zones are expensive to compute but can produce more accurate
results; fewer zones may reduce false positives. The Tolerance/Y value
is is used for rounding <i>y</i> coordinates to the nearest multiple of
this value. For example, if Tolerance/Y is 5
and a word at position (452,137) is followed by a superscript at
(468,140), both will be treated as having a <i>y</i> coordinate of 140.
<h3 id="ran">Page Ranges</h3>
<p>By default <font color="darkgreen">DiffPDF</font> compares every pair
of pages in the two PDFs (or as many pairs of pages as the number of
pages in the shorter PDF). It is also possible to compare
particular pages or page ranges. For example, if there are two versions
of a PDF file, one with pages 1-12 and the other with pages 1-13 because
of an extra page having been added as page 4, they can be compared by
specifying two page ranges, 1-12 for the first and 1-3, 5-13 for the
second. This will make DiffPDF compare pages in the pairs (1, 1), (2,
2), (3, 3), (4, 5), (5, 6), and so on, to (12, 13).
<h3 id="mar">Margins</h3>
<p>It is possible to make <font color="darkgreen">DiffPDF</font> ignore
any text that is above a specified top margin, below a specified bottom
margin, left of a specified left margin, and right of a specified right
margin. One or more of these margins can be specified by, first,
checking the <b>Exclude Margins</b> checkbox, and second by setting
any of the margins. Margins can be set by clicking on a page view or by
using the margin spinboxes.
<h3 id="sav">Saving</h3>
<p>Use the <b>Save As</b> button to pop up a Save dialog. This dialog
lets you save a <tt>.pdf</tt> file with the highlighted changes, or
individual image files (e.g., in <tt>.png</tt> or various other common
image formats). The dialog supports saving the current or all left
pages, right pages, or both pages.
<h3 id="opt">The Options Dialog</h3>
<p>This dialog is invoked by clicking the <b>Options</b> button.
The dialog supports changing the highlighting color, whether to use
a pen or fill or both, and the fill's opacity. The Square Size is used
when doing Appearance mode comparisons: the smaller the size the more
fine-grained the highlighting is—and the slower to compute.
The Rule width determines the thickness of the margin rules which are
used to indicate the vertical position of differences; the rules can
be switched off using a Rule width of 0.
<h3 id="dock">Dock Windows</h3>
<p>The Controls, Actions, Margins, Zoning, and Log views are in dock
widgets—these can be dragged into other dock areas (in which case
they will reshape themselves as necessary), or dragged to float free.
The Margins, Zoning, and Log views can also be closed; right click a
dock area splitter and check their checkbox to open them again. These
views may be shown tabbed: if there is enough space they can be dragged
out of their tabs and all shown in full.
<h3 id="cli">Command Line Usage</h3>
<p>Although <font color="darkgreen">DiffPDF</font> is a GUI program, if run from a console with two PDF
files listed on the command line,
<font color="darkgreen">DiffPDF</font> will start up and
immediately compare them in Words mode, or in Appearance mode
if their names are preceded with <tt>-a</tt> or
<tt>--appearance</tt> on the command line,
or in Characters mode if their names are preceded with <tt>-c</tt> or
<tt>--character</tt> on the command line. Run
<font color="darkgreen">DiffPDF</font> with <tt>--help</tt> to see all
the command line options. (This won't work on Windows, although the
other command line options will.) Here is the <tt>--help</tt>
output:
<pre>
usage: diffpdf [options] [file1.pdf [file2.pdf]]
A GUI program that compares two PDF files and shows
their differences.
The files are optional and are normally set through
the user interface.
options:
--help show this usage text and terminate (run the
program without this option and press F1 for
online help)
--appearance -a set the initial comparison mode to Appearance
--characters -c set the initial comparison mode to Characters
--words -w set the initial comparison mode to Words
--language=xx set the program to use the given translation
language, e.g., en for English, cz for Czech;
English will be used if there is no translation
available
--debug=2 write the text fed to the sequence matcher into
temporary files (e.g., /tmp/page1.txt etc.)
--debug=3 as --debug=2 but also includes coordinates in
y, x order
</pre>
<p>
The text reordering is done by the
<tt>TextItems::columnZoneYxOrder()</tt> method in the
<tt>textitem.cpp</tt> file: suggestions for improvement are welcome!
(Note that when using <tt>--debug3</tt> coordinates are output in
<i>y</i>, <i>x</i> order.)
<p>If you're specifically looking for a command line PDF comparison
tool, e.g., for automated testing, try
<a href="http://www.qtrac/eu/comparepdf.html">comparepdf</a>.