-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data not being processed correctly #5
Comments
If you find more broken images, please reopen the issue. |
Thank, you. I found more files If I can help with something else please tell me |
I fixed it mostly. There is still something funny with the height of the images. |
Is there something I can do to help? |
The bug is a missing vertical line at the end. Nothing terrible. |
I have a new batch Thanks |
Most are fixed, but a few appear still broken. I need to convert them to pbm to check against. |
@maxpowel can you check the files 1106, 888, 786 and 746 ? |
Thank you @s3bk! These images are correct. They are just strange lines. These images come from a pdf, exactly from this page: As you can see, it is a table and these strange lines correspond to the table borders and inner lines. The scanner somehow separated the table lines from the text. These images are the text: The missing parts of the left column (some kind of ids) are also separated in other images. Looks like the monster of frankesntein. Now all files are being properly processed (it is big scanned PDF file with hundreds of images). Now I will test with a few thousand PDFs I have. I will notice you with the results. Again, thanks for your effort |
Hello, I have some new files but now the issue is with decode_g3. My testing code is this:
It is very similar to g4. Here some samples Thank you |
Hi, I found other image that is not being processed correctly.
This attachment contains the original data, the pbm created by fax crate and the tiff to preview it
stream_6.zip
Looks similar to #2 because it is partially processed and at some point boom!
Using this go snippet you can get this output (it is similar to the one in other ticket but with with newlines calculated instead of hardcoded). The library source code is at https://github.com/golang/image/blob/master/ccitt/reader.go
Thank you so much!
The text was updated successfully, but these errors were encountered: