Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pp formulanet #14429

Merged
merged 5 commits into from
Dec 23, 2024
Merged

Conversation

liuhongen1234567
Copy link
Contributor

  1. 添加了 PP-FormulaNet 公式识别算法
  2. 在静态图下支持了UniMERNet 和 PP-FormulaNet的 KV cache 加速

Copy link

paddle-bot bot commented Dec 20, 2024

Thanks for your contribution!

Copy link
Collaborator

@GreatV GreatV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议把新增的模型文档link上

- LaTeX-OCR: algorithm/formula_recognition/algorithm_rec_latex_ocr.md

tests/test.py Outdated Show resolved Hide resolved
@liuhongen1234567
Copy link
Contributor Author

建议把新增的模型文档link上

- LaTeX-OCR: algorithm/formula_recognition/algorithm_rec_latex_ocr.md

好的,已添加

@liuhongen1234567
Copy link
Contributor Author

@GreatV 辛苦在review 一下呢?

configs/rec/PP-FormuaNet/rec_pp_formulanet_l.yml Outdated Show resolved Hide resolved
configs/rec/PP-FormuaNet/rec_pp_formulanet_s.yml Outdated Show resolved Hide resolved
ppocr/losses/rec_ppformulanet_loss.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@GreatV GreatV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GreatV GreatV merged commit d523388 into PaddlePaddle:main Dec 23, 2024
3 checks passed
@Bestboy125
Copy link

请问这部分公式推理速度怎么样呢,paddle_ocr_v4可以快速完成一张图片多个公式检测再识别(原理是否是并行识别呢),是否有一种办法将这个模型,跟检测模型集成,然后并行识别呢,从而加快复杂图片的推理速度呢

@liuhongen1234567
Copy link
Contributor Author

  1. 关于速度方面,PP-FormulaNet-S 单图推理大概在 200 ms 左右,组batch=15可以达到 30 ms ;PP-FormulaNet-L 单图推理大概在 2000 ms 左右,组batch=15可以达到 300 ms ;
  2. paddle_ocr_v4可以快速完成一张图片多个公式检测再识别方面,公式识别模型(PP-FormulaNet-S 、PP-FormulaNet-L和 UniMERNet)是可以支持并行识别的,但是ppocrv4目前在公式检测方面效果是否ok不确定
  3. 是否有一种办法将这个模型,跟检测模型集成,然后并行识别呢。这个是可以实现的,不过这种方案目前集成在PaddleX中,可以参考 https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta2/docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.md。 不过这种方案是几个月前做的,仅支持了 laTeX-OCR (不支持并行);预计下周会在paddleX develop版本接入最新的公式识别产线,从而支持最新的几个模型

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants