Python utility to manipulate PDF page labels.
A useful but rarely-used feature of PDFs is the ability to use custom naming schemes for pages. This allows to start a PDF at any given page number instead of 1, to restart page numbering for each section of a long PDF, or to attribute a certain name to a given page.
PDF files can contain one ore more page numbering schemes. Each scheme has a start page, specifying the page where it should take effect. All subsequent pages will be affected by the scheme, until another page numbering scheme is encountered. This utility allows adding, removing, and updating page numbering schemes in a PDF file.
Install pip if you don't have it already:
$ sudo apt install python3-pip
Install brew, and then install python:
brew install python
Install WSL and then follow the linux instructions.
Install pagelabels-py :
python3 -m pip install --user --upgrade pagelabels
This reads the file /tmp/test.pdf
,
and creates a copy of it with new page labels
without deleting the ones that may already exist.
The new index will take effect from the 1st page of the PDF,
will be composed of uppercase roman numerals, preceded by the string "Intro ",
and starting from "V".
Page numbers will be: "Intro V", "Intro VI", "Intro VII", ...
python3 -m pagelabels --startpage 1 --type "roman uppercase" --prefix "Intro " --firstpagenum 5 --outfile /tmp/new.pdf /tmp/test.pdf
python3 -m pagelabels -h
This should print:
usage: pagelabels [-h] [--outfile out.pdf] [--delete | --update]
[--startpage STARTPAGE]
[--type {arabic,roman lowercase,roman uppercase,letters lowercase,letters uppercase,none}]
[--prefix PREFIX] [--firstpagenum FIRSTPAGENUM]
[--load other.pdf]
file.pdf
Add page labels to a PDF file
positional arguments:
file.pdf the PDF file to edit
optional arguments:
-h, --help show this help message and exit
--outfile out.pdf, -o out.pdf
Where to write the output file
--delete delete the existing page labels
--update change all the existing page numbering schemes instead
of adding a new one
--startpage STARTPAGE, -s STARTPAGE
the index (starting from 1) of the page of the PDF
where the labels will start
--type {arabic,roman lowercase,roman uppercase,letters lowercase,letters uppercase,none}, -t {arabic,roman lowercase,roman uppercase,letters lowercase,letters uppercase,none}
type of numbers: arabic = 1, 2, 3, roman = i, ii, iii,
iv, letters = a, b, c, none = (number is empty)
--prefix PREFIX, -p PREFIX
prefix to the page labels
--firstpagenum FIRSTPAGENUM, -f FIRSTPAGENUM
number to attribute to the first page of this index
--load other.pdf copy page number information from the given PDF file
python3 -m pagelabels --delete file.pdf
The following will take the page labelling scheme from source.pdf
and
apply it to target.pdf
:
python3 -m pagelabels --load source.pdf target.pdf
Let's say we have a PDF named my_document.pdf
, that has 12 pages.
- Pages 1 to 4 should be labelled
Intro I
toIntro IV
. - Pages 5 to 9 should be labelled
2
to6
. - Pages 10 to 12 should be labelled
Appendix A
toAppendix C
.
We can issue the following list of commands:
python3 -m pagelabels --delete "my_document.pdf"
python3 -m pagelabels --startpage 1 --prefix "Intro " --type "roman uppercase" "my_document.pdf"
python3 -m pagelabels --startpage 5 --firstpagenum 2 "my_document.pdf"
python3 -m pagelabels --startpage 10 --prefix "Appendix " --type "letters uppercase" "my_document.pdf"
Let's say we have a PDF with pages named 10, 11, 12, A, B, C
and we want to add a prefix to the labels, while keeping the existing custom
page offset and styles. We can do that using the --update
option of pagelabels:
python3 -m pagelabels --update --prefix "EX-" my_document.pdf
This will update the existing labels to EX-10, EX-11, EX-12, EX-A, EX-B, EX-C.
pagelabels-py internally uses pdfrw, that can write only PDF version 1.3. If your PDF uses features that are not compatible with PDF 1.3, you may see it not being rendered correctly after using pagelabels-py.
This project can be used as a python library. See pagelabels on the python package index.