Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content anchoring #79

Open
beefeater7 opened this issue Sep 27, 2019 · 3 comments
Open

Content anchoring #79

beefeater7 opened this issue Sep 27, 2019 · 3 comments

Comments

@beefeater7
Copy link

It occurred to me that we have little in the way of content alignment as it is. The contents of scanned and processed books shift around between page turns. Probably not hard to fix in Photoshop, but an automated solution is always nice.

One useful property of the printed page is the static element: be it the book title, chapter name, or the most common: page number, these landmarks serve as the perfect coordinates for absolute content positioning on each page. If we are aiming for consistency this is a step in the right direction.

Another way of anchoring the content may relate to the dewarping functionality. In documents containing adjusted blocks of text, the margins provide a clue to positioning as well as proportions throughout. We can assume these paragraphs share width from page to page.

So, to recap, these are potential points to "nail down", as constant coordinates for interpage processing:

  • Left, right text margin (Adjusted)
  • Title, chapter
  • Page number (applied every other page)

The more anchors, the better precision of transformation. I think a simple height-width resize would suffice, given a preceding dewarp.

Anchoring

Just shooting this out there.

@Piolie
Copy link

Piolie commented Sep 30, 2019

I agree with you that ST is lacking an automatic way of aligning the content in the final page. For my use case, the best would be to fix the page number using the output.

At the moment it is possible to semiautomatically align the pages by creating guides and pressing Ctrl+Shift+double LMB around the page number. However, the algorithm relies on the page before processing, which can have artifacts that yield some inconsistencies that require manual adjustment.

Since all page numbers end up in the upper (lower) left (right) corner of each page, I would really appreciate a way of aligning the content in that area between all pages. Don't know how hard it would be to implement.

@Mister-Teatime
Copy link

Oh yes, this would be nice. And I very much agree with your statement that dewarping would need to happen first (see issue #85). If that is done, then there could be a relatively quick way to find left and right edges of content and align to that (either left or right, depending on odd/even numbers, or both, and scale the content/resolution accordingly.
Up/down might be more tricky with some layouts, as there are books where the first page of a chapter has a different layout (heading might not be at the top border), and the last page on a chapter may not go all the way to the bottom -- but those could be handled by the user.

One alternative implementation would be to assume that the dewarp box spans the whole page and marks its corners -- that provides one rectangle per page. The location of content within that rectangle could then be kept as is, and the sizes of the various rectangles adjusted to be equal in the final output. This would have the advantage that no additional work is necessary after dewarping, but the disadvantage that dewarping (or at least marking/finding the page corners would be required.

Fancy way to unify this: Dewarping could leave "markers" on the page corners, and so does content selection etc.. The user could then select whether they want to use the existing markers (and which ones) to align pages, or to create new ones (e.g. on page numbers etc.), manually. This would be similar to the concept used by Hugin to align photos in a panorama, except with fewer markers, and the markers being "cheap" to produce.

@zvezdochiot
Copy link

zvezdochiot commented Dec 1, 2019

@beefeater7 say:

One useful property of the printed page is the static element: be it the book title, chapter name, or the most common: page number

An interesting thought: use the indication of the page number to calculate the page margins. That is, fix the position of the page number on all pages and calculate the fields from this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants