-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for rebuilding the xref table for damaged PDF files #45
Comments
So for this file the "startxref" value is wrong, as are all of the xref table offsets. More than likely the original file was edited on Windows with a plain text editor (Notepad or similar) which changed the line endings from LF only to CR LF. Some PDF viewers will attempt to generate their own xref value for files like this, but I have not done so for PDFio due to the chances for errors and the likelihood that such corruption will also damage the binary streams in the file, making it unreadable that way... I will keep this issue open for now but it will not be "fixed" any time soon... |
Here's another pdf, newly generated so unlikely to be damaged. |
Bad xref table header 'xref '. |
That file isn't damaged in the same way; in fact, the issue is that there is trailing whitespace after the "xref" keyword but the current parser won't allow it since the PDF specs all say the xref table starts with a line consisting of a single "xref" keyword and doesn't talk about extra whitespace, etc. So I will update the xref loading code to allow for this but it won't fix the problem with the first file you linked to... |
If you find other files with issues, please report them as separate issues, otherwise it makes it harder for me to track when a problem is actually fixed... Thanks! |
Will do, thanks a lot, Michael. though the fix doesn't seem to work 😢 |
The pdfiototext tool fails to parse the file:
https://www.dropbox.com/scl/fi/1nhivpa3sbjejza8l53rz/NTFS.pdf?rlkey=zvphkczuy71b0vil8zvmrz95v&dl=0
System Information:
The text was updated successfully, but these errors were encountered: