Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory heap size shoots up beyond 1.5 GB for 50MB pdf file #324

Open
ladvishal1985 opened this issue Aug 31, 2023 · 10 comments
Open

Memory heap size shoots up beyond 1.5 GB for 50MB pdf file #324

ladvishal1985 opened this issue Aug 31, 2023 · 10 comments

Comments

@ladvishal1985
Copy link

Thanks for such a great library. We are able to reliably able to write watermark on the PDF. But we are facing trouble with the memory consumption. This is not allowing to use this library for our node servers since this issue is resulting in terminating the pods.
For example:

// We are getting file via signed url pass the response as below: 
const recipe = new Recipe(fileBuffer); 
const used = process.memoryUsage().heapUsed / 1024 / 1024;
this.logger.log(`The script uses approximately after recipe read ${Math.round(used * 100) / 100} MB`);
//Creating the new recipe shoots up the memory heap size above 1.5 GB and 
//After creating the stream 
const readerStream = new Muhammara.PDFRStreamForBuffer(fileBuffer);
// The memory size shoots up to almost 3 GB. 
// Then we use below 
const reader = Muhammara.createReader(readerStream); // We need this to get the total page count.
const pageCount =  reader.getPagesCount();

Is there any solution to this problem ?
Currently we are targeting a file size of up to 50 mb and may go upto 100mb.

@julianhille
Copy link
Owner

julianhille commented Aug 31, 2023

There are some Infos missing.

  • Where does fileBuffer come from and what is it?
    • Please add a sample how to initialise it. is it uploaded to the server? not sure what the "signed file url" part means.
  • Why do you initialize recipe and then dont use it? for showing purpose?
  • What is your goal? watermarking or getting the page count?

About the 3GB: 3GBis what i would expect if (!!) recipe shots to 1,5 GB it, as recipe is just using muhammara under the hood and node does not free the memory between recipe = ... and readerStream = ... and both create their own objects from the buffer. It couldn't free any memory as recipe is still in used and not dereferenced. So there is that. :>

are you able to provide a sample file?

@ladvishal1985
Copy link
Author

Check the below snippet

async downloadAndAddwatermark(signedUrl: string, waterMark: string) {
    try {
      const file$ = this.downloadFileUsingSignedUrl(signedUrl);
      const fileBuffer = await firstValueFrom(file$.pipe(take(1))); //<-- Download the file from here as array buffer
      const modifiedBuffer = await this.addWatermark(fileBuffer, waterMark);
      return modifiedBuffer;
    } catch (error) {
      // catch error here
    }
  }

  private addWatermark(fileBuffer, waterMark: string) {
    try {
      const reciepe = new Recipe(fileBuffer); // <-- Memory consumption increases after this.
      const readerStream = new Muhammara.PDFRStreamForBuffer(fileBuffer);
      const reader = Muhammara.createReader(readerStream);
      const pageCount = reader.getPagesCount();
      
      const modifiedReciepe = this.addWatermarkPage(reciepe, {
        currentPage: 1,
        watermark: waterMark,
        pageCount
      });

      return modifiedReciepe.endPDF((outputBuffer) => outputBuffer);
    } catch (error) {
      //catch error here
    }
  }
  private addWatermarkPage(recipe: Recipe, { currentPage, watermark, pageCount }) {
    if (currentPage > pageCount) {
      return recipe;
    }
    const pgWidth = recipe.pageInfo(currentPage).width;
    const pgHeight = recipe.pageInfo(currentPage).height;
    const initialConfig: FileBufferEditConfig = {
      size: 20,
      text: watermark,
      width: pgWidth,
      x: 0
    };
    const textDetails = this.getTextDetails(initialConfig); // Gets inital config object for text
    const newRecipe = recipe
      .editPage(currentPage)
      .text(watermark, textDetails.x, pgHeight - 30, textDetails.textOptions)
      .text(watermark, textDetails.x, 30, textDetails.textOptions)
      .endPage();
    
      return this.addWatermarkPage(newRecipe, {
      currentPage: currentPage + 1, 
      watermark: watermark,
      pageCount
    });
  }
private getTextDetails(options: FileBufferEditConfig) {
    const writer = Muhammara.createWriter(new Muhammara.PDFWStreamForBuffer());
    const fontFile = path.join(this.fontPath, 'Helvetica.ttf');
    const fontObject = writer.getFontForFile(fontFile);
    let textWidth = fontObject.calculateTextDimensions(options.text, options.size).width;
    while (textWidth >= options.width - 20) {
      options.size = options.size - 1;
      textWidth = fontObject.calculateTextDimensions(options.text, options.size).width;
    }
    options.x = options.width / 2 - textWidth / 2;
    const textOptions = {
      font: 'Helvetica',
      size: options.size,
      colorspace: "rgb",
      color: '#F21A1A',
      opacity: 0.5,
    };
    return {
      textOptions: textOptions,
      x: options.x
    };
  }

@ladvishal1985
Copy link
Author

@julianhille: Provided the sample here.

@julianhille
Copy link
Owner

if files are that huge, most of the time the file is, even if temporary, stored on disk.
please check if possible to use new muhammara.PDFRStreamForFile('./huge.pdf'); this could possibly reduce the memory usage greatly

@julianhille
Copy link
Owner

You may also have a look at CopyingContext that also might help reduce

@julianhille
Copy link
Owner

Did you solve it? Do you had a chance to look at copying context?

@ladvishal1985
Copy link
Author

No We did not got a chance to use copying context. Any example might help us. Currently we solved the issue by writing file to disc and modifying it. This has helped us to work our solution reasonably well. This is how we do it.

  const pageCount = reader.getPagesCount();
  const fontObject = writer.getFontForFile(this.fontFile);
  const xobjectForm  = writer.createFormXObjectsFromPDF(source, Muhammara.ePDFPageBoxMediaBox);

.....

 pageContent
          .doXObject(page.getResourcesDictionary().addFormXObjectMapping(xobjectForm[i] as any))
          .writeText(watermark, config.x, yTop, textOptions)
          .writeText(watermark, config.x, yBottom, textOptions)
          .Q();
        writer.writePage(page);

@ladvishal1985
Copy link
Author

You close this issue..

@julianhille
Copy link
Owner

i feel like its memory-streams module. i just saw that the whole file is copied and modified several times in memory through a fake in memory stream.

@ladvishal1985
Copy link
Author

@julianhille : We are now using Apache PDFBox to implement our use case. This is much faster and memory efficient compared to NodeJS solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants