Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#153: Faster PDF preview generation #271

Closed
wants to merge 8 commits into from

Conversation

kirillt
Copy link
Member

@kirillt kirillt commented Jul 10, 2022

#153
Adopting new PDF functionality from arklib
See ARK-Builders/arklib#11 and ARK-Builders/arklib-android#7 for details

@kirillt kirillt temporarily deployed to Development July 10, 2022 09:28 Inactive
@kirillt kirillt temporarily deployed to Development July 10, 2022 20:10 Inactive
@kirillt
Copy link
Member Author

kirillt commented Jul 10, 2022

Awesome that we touched base with and wired some advanced stuff from Rust side to Kotlin.
But here is performance degradation, at the moment.

Manual testing on a folder with many big PDF files produced the following results:

  • this branch: 115 seconds
  • main branch: 40 seconds

The folder contains 469 documents, size of each is from 41KB to 21MB.

@kirillt
Copy link
Member Author

kirillt commented Jul 10, 2022

PDF archive for testing (487MB):
https://www.taran.space/dev/pdf-test.zip

@j4w3ny
Copy link
Contributor

j4w3ny commented Jul 11, 2022

Awesome that we touched base with and wired some advanced stuff from Rust side to Kotlin. But here is performance degradation, at the moment.

Manual testing on a folder with many big PDF files produced the following results:

* **this branch: 115 seconds**

* **`main` branch: 40 seconds**

The folder contains 469 documents, size of each is from 41KB to 21MB.

We could switch to Medium Quality for rendering preview. I have tested it and it indicated that it will significantly cut down the time cost while providing previews with similar quality.

@kirillt kirillt force-pushed the 153-faster-pdf-preview-generation branch from c603b87 to 241fd7d Compare January 13, 2023 18:17
@kirillt kirillt temporarily deployed to Development January 13, 2023 19:03 — with GitHub Actions Inactive
@kirillt kirillt temporarily deployed to Development January 14, 2023 12:58 — with GitHub Actions Inactive
@kirillt kirillt temporarily deployed to Development January 14, 2023 16:00 — with GitHub Actions Inactive
@kirillt kirillt temporarily deployed to Development January 14, 2023 16:37 — with GitHub Actions Inactive
@kirillt kirillt force-pushed the 153-faster-pdf-preview-generation branch from 0ed0262 to ac7f49b Compare January 14, 2023 16:59
@kirillt kirillt temporarily deployed to Development January 14, 2023 16:59 — with GitHub Actions Inactive
@kirillt
Copy link
Member Author

kirillt commented Jan 14, 2023

State of the things: after recent changes in arklib Navigator from this PR hangs up during rendering previews for PDFs. Sometimes, after 1 document, sometimes after 10.

If we look closer at this thread: ajrcarey/pdfium-render#59, we'll notice that changes in arklib are almost identical to the code sample kindly provided by @ajrcarey :) And he put it clearly that this snippet isn't tested and that Pdfium itself is not thread-safe :) Although it could work.

I suggest to do the simplest thing for this moment and ensure that all PDF previews are generated sequentially. We could batch PDF resources into separate set and process it from a dedicated thread/coroutine. @mdrlzy probably you could implement it?

@kirillt
Copy link
Member Author

kirillt commented Jan 14, 2023

Also this feature could help us to debug this ARK-Builders/ark-android#61

@mdrlzy mdrlzy temporarily deployed to Development January 15, 2023 23:02 — with GitHub Actions Inactive
@kirillt
Copy link
Member Author

kirillt commented Jan 16, 2023

Sequential previews generation benchmark:

OS: Android 11
Device: Samsung Galaxy Note 10+
Benchmark data: 477 PDF files, 857 MB total

Results (total indexing time, previews/thumbnails generation is the bottleneck):
main: 1 min 5 sec
this PR (high): 9 min 43 sec
this PR (medium): 1 min 4 sec

@kirillt kirillt temporarily deployed to Development January 16, 2023 21:54 — with GitHub Actions Inactive
@kirillt
Copy link
Member Author

kirillt commented Jan 16, 2023

Data size of outputs in sequential tests:
main: 84.7 MB
this PR (medium): 87.4 MB

Data sizes of individual previews slightly differ, some are heavier in this PR and some are lighter comparing to main.

Dimensions of the previews are the same in both versions.
I don't see significant difference in quality of previews.

Samples of the previews (click to see in the original size):

(main)

(this PR)

Previews of landscape-oriented documents are generated by this PR version in portrait layout, they are just turned clockwise. Not sure if it is a problem, anyway should be possible to fix easily:

(main)

(this PR)

@kirillt
Copy link
Member Author

kirillt commented Jan 16, 2023

I think we should merge this PR, but also implement:

  • Sequential rendering of previews for PDF resources
  • Thumbnails generation should be separated from previews, so it could remain parallel for all kinds of resources

We need to take this task in-scope:

  • Enable parallel redering of previews for PDF resources

@mdrlzy
Copy link
Member

mdrlzy commented Jan 18, 2023

Previews of landscape-oriented documents are generated by this PR version in portrait layout, they are just turned clockwise. Not sure if it is a problem, anyway should be possible to fix easily:

Maybe wrong orientation is because of this?
.rotate_if_landscape(PdfBitmapRotation::Degrees90, true);
https://github.com/ARK-Builders/arklib/blob/868f4586ddc62c8f438a2cd8ddd0d527f3af1452/src/pdf.rs#L45

@kirillt
Copy link
Member Author

kirillt commented Jan 20, 2023

@mdrlzy I just realized that we can implement sequential rendering of PDF previews simply using a shared atomic variable PDF_JOBS. Each PDF coroutine checks PDF_JOBS and if it is equal than pre-defined limit (1 or 2) then the coroutine sleeps. Should be easier than doing some thread-pool magic. But the variable must be atomic (i.e. support atomic increment/decrement/equality-check).

@kirillt
Copy link
Member Author

kirillt commented Jan 22, 2023

Superseded #329

@kirillt kirillt closed this Jan 22, 2023
@kirillt kirillt deleted the 153-faster-pdf-preview-generation branch February 12, 2023 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants