Feature request: Dewarp before content selection #85

Mister-Teatime · 2019-11-24T01:59:19Z

I'm digitizing books which I'd rather not press flat against a scanner, which are much quicker digitized by means of setting up decent light, lying them flat on the floor and pointing a good camera straight down at them.

This causes quite a bit of warping (particularly with softcovers or thicker books), which the dewarping algorithm deals with reasonably well.

However, I have to select the pages, their content and in some cases pictures beforehand, and Scantailor assumes that most of these things were rectangular (which they only become after dewarping ...). For image maps, I can use polygons instead, for content selection I can adjust content boxes accoprdingly, but these workarounds take significantly more time than if the contents were nicely aligned, and sometimes the box around content will have corners off-page, despite ample page margins. Significantly warped content also seems to throw the automatic content selection off-balance, which means even more extra manual work.

This often makes me wish I could reverse the order of those steps, and do the dewarping first.
I'm not familiar with the code architecture (yet...?) but I would hope that it must be possible to change the order of processing steps to put dewarping before the content selection stage.

Proposed implementation( I'm not familiar with Scantailor's code structure, so I can only comment on what would make sense from a user's perspective -- so I accept that some of the following may not be practically realistic, or that there are smarter/faster ways to get a working implementation.):

My favourite implementation would be to bundle the dewarping with the deskewing step. It seems to me that the dewarp tool would be able to actually replace the "deskew" (i.e. rotation) function, as it can represent rotation, too. This might be overkill for users who really just need to rotate the contents a little, but since dewarping is already optional, it could stay optional, either following the rotation step, or optionally replacing it. ("deskew method: nothing | rotate | dewarp")

Thinking even one more step forward, since the dewarp tool can also represent pure rotation and perspective correction without warping, those could become different options for the same tool, rather than separate tools. "Rotation" would look similar to right now (sets one angle), adding the "Perspective" option would add a rectangular frame with all four corners movable (sets two additional angles for perspective), and activating the "Dewarp" option then allows users to deform the grid.

zvezdochiot · 2019-11-24T04:44:35Z

@Mister-Teatime say:

those could become different options for the same tool, rather than separate tools.

Or they can be separate steps of the complex "Transformation", if it is easier to implement them this way:

transform=[deskew[+deperspective[+dewarp]]]
-------------------------------------------
  scale     1:1     area:area      free

See also https://github.com/tibob/yasw , https://github.com/ImageProcessing-ElectronicPublications/yasw , trufanov-nok/scantailor-universal#67 , https://github.com/scantailor/scantailor/issues/317 .

konos93 · 2019-11-24T14:27:07Z

press pages on a flat surface of glass help you for better results .its something you cant avoid if you are a non-english reader (because english ocr had ridiculously good results ) reading b&w photo its the best way. https://www.youtube.com/watch?v=mR2TQOHEDYc use subs

zvezdochiot · 2019-11-24T14:38:24Z

@konos93 say:

on a flat surface of glass

The scanning technique has been demonstrated very well, but not always (rather very rarely) you have to deal with a paper edition, more often you get someone’s scans in your hands.

konos93 · 2019-11-24T16:23:55Z

what about 2 surfaces of glass https://www.youtube.com/watch?v=n1ZKAbBjeJ0 check first 3 minutes

konos93 · 2019-11-24T16:28:35Z

@konos93 say:

on a flat surface of glass

The scanning technique has been demonstrated very well, but not always (rather very rarely) you have to deal with a paper edition, more often you get someone’s scans in your hands.

sorry English is not my native language i understood what you said after i comment. yes its easy to scan your own books or the ones you find rather than change some one scans especially if they are not good digitalized very well. scanning is a difficult hobby. not very easy not very hard but it has its own difficulties

zvezdochiot · 2019-11-24T16:34:06Z

@konos93 say:

English is not my native language

And not my. ( https://translate.google.ru )

Mister-Teatime · 2019-11-26T01:20:54Z

Sure, if you can get nice flat scans, that makes the processing easier -- but it takes time to set up, particularly the fancy 2-camera setup, and then it still takes longer to take the pictures themselves. I can set up a tripod, adjust settings on the camera and start shooting in less than 5 minutes, and then it's about 2 seconds per double-page, which is probably twice as fast as the method with a single glass pane, and a little faster than the 2-camera setup -- although I certainly admire the setup. I've seen professional book scanners of this type, and they're crazy expensive.

However, I'm not doing this professionally, and highest quality is not my aim, either. I'm just trying to reduce the weight of my bookshelf by digitizing some books that I'll never read but might need sometimes to look something up. So the main aim is to have books readable, OCR to work well, and get small files.

But all this still misses the point: If a feature to de-warp pages exists, then I think it should be applied at the point in the process where it is most useful. And I think that is before content selection.

Mister-Teatime · 2019-12-01T10:46:30Z

@konos93 said:

yes its easy to scan your own books or the ones you find rather than change some one scans especially if they are not good digitalized very well. scanning is a difficult hobby. not very easy not very hard but it has its own difficulties

Agreed. I think there's a range of ambitions and abilities, and use cases, and usually I would be the sort of perfectionist who spends a month building a set up, then one minute per page to scan them, and another minute to post-process them ... and I've already done scans in this way, and they look really good.
Right now, I have about a dozen books, half of them already photographed and given away, and I cannot spend a year processing the pictures, but I want to pack them into small PDFs and OCR them. But the nice thing about Scantailor is that it allows you to get the most out of all scans, whether they're professional scans or just phone camera pictures, and to make really small files, fast, with comparably good quality.

The content selection, page layout and picture selection all assume that the page and pictures are rectangular, so their automatic modes will often fail if that's not the case -- so dewarping first could really speed up the whole process. I've actually done a test where I simply selected the whole input image as content and only did the dewarping, then used the output from that in another scantailor project, and got results faster because all the automatic settings work so much better then -- but seems weird to me that you should need to go around twice in order to get the best use out of the dewarping tool.

zbgns · 2019-12-04T15:07:11Z

Maybe you should give a try to Scan Tailor Experimental. The dewrap function was moved to the 3rd stage, so apparently it has the functionality you want. It is not under development for very long time, but still very useful.
It may be useful to read this thread as the workflow is different than in case of other Scan Tailor forks (especially does not use DPI).

konos93 · 2019-12-05T18:43:00Z

@konos93 said:

yes its easy to scan your own books or the ones you find rather than change some one scans especially if they are not good digitalized very well. scanning is a difficult hobby. not very easy not very hard but it has its own difficulties

Agreed. I think there's a range of ambitions and abilities, and use cases, and usually I would be the sort of perfectionist who spends a month building a set up, then one minute per page to scan them, and another minute to post-process them ... and I've already done scans in this way, and they look really good.
Right now, I have about a dozen books, half of them already photographed and given away, and I cannot spend a year processing the pictures, but I want to pack them into small PDFs and OCR them. But the nice thing about Scantailor is that it allows you to get the most out of all scans, whether they're professional scans or just phone camera pictures, and to make really small files, fast, with comparably good quality.

The content selection, page layout and picture selection all assume that the page and pictures are rectangular, so their automatic modes will often fail if that's not the case -- so dewarping first could really speed up the whole process. I've actually done a test where I simply selected the whole input image as content and only did the dewarping, then used the output from that in another scantailor project, and got results faster because all the automatic settings work so much better then -- but seems weird to me that you should need to go around twice in order to get the best use out of the dewarping tool.

2 camera 1500 pages per hour https://www.youtube.com/watch?v=n1ZKAbBjeJ0 600 pages 2 hours need till i start read on a Koreader ebook ,one smartphone camera https://www.youtube.com/watch?v=mR2TQOHEDYc 1100 pages per hour check first 3 min for both videos

Mister-Teatime · 2019-12-05T21:24:58Z

Maybe you should give a try to Scan Tailor Experimental. The dewrap function was moved to the 3rd stage, so apparently it has the functionality you want. It is not under development for very long time, but still very useful.
It may be useful to read this thread as the workflow is different than in case of other Scan Tailor forks (especially does not use DPI).

Thanks for the hint! That looks very useful. It's just a shame that with so many forks, I have to decide between multiprocessing and fancier settings (advanced), better user interface (universal), and "correct" workflow order. And Tulon states in that thread that he changed the architecture significantly, so I would assume that his changes would not be be trivial to translate into ST advanced.

I'll definitely give the experimental fork a try

gorgobacka · 2020-03-06T23:08:21Z

This seems like a duplicate of #4 .

zvezdochiot mentioned this issue Nov 24, 2019

Feature request: Dewarp before selecting content trufanov-nok/scantailor-universal#67

Open

Mister-Teatime mentioned this issue Dec 1, 2019

Content anchoring #79

Open

4lex4 added the duplicate label Mar 7, 2020

4lex4 closed this as completed Mar 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Dewarp before content selection #85

Feature request: Dewarp before content selection #85

Mister-Teatime commented Nov 24, 2019

zvezdochiot commented Nov 24, 2019 •

edited

Loading

konos93 commented Nov 24, 2019 •

edited

Loading

zvezdochiot commented Nov 24, 2019

konos93 commented Nov 24, 2019

konos93 commented Nov 24, 2019

zvezdochiot commented Nov 24, 2019

Mister-Teatime commented Nov 26, 2019 •

edited

Loading

Mister-Teatime commented Dec 1, 2019

zbgns commented Dec 4, 2019

konos93 commented Dec 5, 2019

Mister-Teatime commented Dec 5, 2019

gorgobacka commented Mar 6, 2020

Feature request: Dewarp before content selection #85

Feature request: Dewarp before content selection #85

Comments

Mister-Teatime commented Nov 24, 2019

zvezdochiot commented Nov 24, 2019 • edited Loading

konos93 commented Nov 24, 2019 • edited Loading

zvezdochiot commented Nov 24, 2019

konos93 commented Nov 24, 2019

konos93 commented Nov 24, 2019

zvezdochiot commented Nov 24, 2019

Mister-Teatime commented Nov 26, 2019 • edited Loading

Mister-Teatime commented Dec 1, 2019

zbgns commented Dec 4, 2019

konos93 commented Dec 5, 2019

Mister-Teatime commented Dec 5, 2019

gorgobacka commented Mar 6, 2020

zvezdochiot commented Nov 24, 2019 •

edited

Loading

konos93 commented Nov 24, 2019 •

edited

Loading

Mister-Teatime commented Nov 26, 2019 •

edited

Loading