Skip to content

Releases: py-pdf/pypdf_table_extraction

v1.0.1

13 Nov 20:52
Compare
Choose a tag to compare

Changes

🪲 Fixes

📦 Dependencies

v1.0.0

10 Nov 10:26
Compare
Choose a tag to compare

Changes

  • Update version manually for bumpversion plugin to work (#267) @snanda85
  • Make plot dependencies optional (#275) @bosd
  • Handlers.py Fixup leftover typing (#272) @bosd
  • Various documentation fixes (#227) @bosd
  • Network parser Fix B903 (#217) @bosd
  • Fix test_image_warning test (#196) @bosd
  • [REF] flag fontsize (#193) @bosd
  • Fix S310 Audit url open for permitted schemes. (#194) @bosd
  • Rebrand Image based error message (#168) @bosd
  • Silence S311 Error (#177) @bosd
  • [REF] Image processing, Simplyify, Cleanup, Optimize (#152) @bosd
  • Fix variable naming and qa checks (#164) @bosd

💥 Breaking Changes

  • 💣 Make Ghostscript an optional dependency ✨ (#258) @bosd

🚀 Features

  • 🧑‍🎓 add parser comparison notebook ✨ (#263) @bosd
  • 🧑‍🎓 [ADD] Hybrid Parser notebook (#262) @bosd
  • 🧑‍🎓 Quickstart notebook (#231) @bosd
  • add pdfium backend and set it as default image conversion backend ✨ (#230) @bosd
  • [Fix] Image conversion backend Fallback, More verbose backend exception feedback (#183) @bosd
  • Add typing for camelot/backends (#12) @foarsitter
  • Release the new Network and Hybrid parser ❇️ 🚀 (#163) @bosd

🔥 Removals and Deprecations

  • [REM]: History file from previous repo/package (#274) @bosd

🪲 Fixes

  • [FIX] hybrid Keyerror (#251) @bosd
  • [FIX] compute_parse_error, Index out of range (#249) @bosd
  • [REF]: core set_border: Improve performance, Fix index out of Range (#247) @bosd
  • [FIX] Network parser running infinitly (#246) @bosd
  • [REF] netw gen bbox (#244) @bosd
  • boundaries to split lines Fix index out of range (#233) @bosd
  • [REF]: Core Table set edges to reduce complexity (#223) @bosd
  • Fix custom backend functionality (#225) @snanda85
  • [Fix] Image conversion backend Fallback, More verbose backend exception feedback (#183) @bosd
  • Various fixes (#154) @bosd

🐎 Performance

  • [IMP] reduce pdf object loop (#253) @bosd
  • Eliminated duplicate processes. (#255) @bosd
  • [REF] Compute_plausible_gaps, Efficiency, Stability (#243) @bosd
  • [REF] remove_unconnected_edges (#242) @bosd
  • [REF]: core set_border: Improve performance, Fix index out of Range (#247) @bosd
  • [REF] netw gen bbox (#244) @bosd
  • [REF] slow np.isclose to math.isclose (#166) @bosd

🚨 Testing

  • Activate pre-commit on gh actions (#228) @bosd
  • [FIX] Warning on test cli quiet (#189) @bosd
  • [REF] silence download_url S310,add typing (#187) @bosd
  • New test for matplotlib importerror (#167) @bosd

👷 Continuous Integration

  • Bump precommit config to python 3.8 plus (#257) @bosd
  • Activate mypy type checking (#190) @bosd

📚 Documentation

  • Fixup leftover renaming in documentation (#273) @bosd
  • Update Documentation Reflect new namespace (#271) @bosd
  • 🧑‍🎓 add parser comparison notebook ✨ (#263) @bosd
  • 🧑‍🎓 [ADD] Hybrid Parser notebook (#262) @bosd
  • 🧑‍🎓 Quickstart notebook (#231) @bosd
  • Update docstrings, add backends and fallback (#229) @bosd
  • [IMP] CLI Documentation (#182) @bosd
  • Release the new Network and Hybrid parser ❇️ 🚀 (#163) @bosd

🔨 Refactoring

  • Clean code (#261) @bosd
  • [REF] Compute_plausible_gaps, Efficiency, Stability (#243) @bosd
  • [REF] remove_unconnected_edges (#242) @bosd
  • [REF] netw gen bbox (#244) @bosd
  • Various Flake8 fixes (#224) @bosd
  • [REF] Simplify and fix Table.set_span (#226) @bosd
  • [REF]: Core Table set edges to reduce complexity (#223) @bosd
  • Fix custom backend functionality (#225) @snanda85
  • [REF]: network parser generate_table_bbox -> split into mark_processe… (#211) @bosd
  • [REF] Network parser search header split into sub methods to reduce complexity (#216) @bosd
  • [REF]: find lines (#204) @bosd
  • Flake8 fixes base parser (#205) @bosd
  • [REF] search_table_body (#203) @bosd
  • [REF]: get_table_areas (#199) @bosd
  • [REF]: Find_closest_tls (#202) @bosd
  • [REF]: get_table_index (#200) @bosd
  • [REF] copy spanning text (#198) @bosd
  • [REF] lattice -reduce_index (#197) @bosd
  • [REF] Fix B028 (#201) @bosd
  • [REF] Scale Image (#195) @bosd
  • [REF] get_index_closest_point (#175) @bosd
  • [REF] split_textline (#178) @bosd
  • Activate mypy type checking (#190) @bosd
  • [REF] silence download_url S310,add typing (#187) @bosd
  • [REF] Compute accuracy, score_val to lowercase (#176) @bosd
  • [REF] slow np.isclose to math.isclose (#166) @bosd
  • Fix Flake8 warning on test_invalid_url (#184) @bosd
  • Release the new Network and Hybrid parser ❇️ 🚀 (#163) @bosd

💄 Style

  • [IMP] add typing to handlers, update docstings and pdfminer url (#254) @bosd
  • Pre-commit fixes (#185) @bosd
  • Various fixes (#154) @bosd

📦 Dependencies

v0.0.2

06 Oct 14:56
Compare
Choose a tag to compare

Changes

  • Version bump to 0.0.2 release (#159) @bosd
  • Remove Ghostscript deprecation warning (#155) @bosd
  • [MRG] added test to validate when plot_type is None (#41) @bosd
  • [MRG] IMP Coverage: test for invalid url (#40) @bosd
  • Added supported for CLI margins option. (#111) @Niremizov
  • Bump sphinx-prompt from 1.8.0 to 1.9.0 in /docs (#95) @bosd
  • Update .Gitignore (#92) @bosd
  • [DEV] Run test workflow against main (#50) @MasterOdin
  • Run actions only against pull requests. (#26) @foarsitter

🚀 Features

  • Rebrand CLI to pypdf_table_extraction (#158) @bosd
  • Reflect camelot in pypdf_table_extraction namespace (#11) @foarsitter
  • [MRG] Python 3.12 (#44) @bosd
  • [IMP] Update readthedocs url (#59) @bosd
  • [UPD] Update repo link in toml file (#64) @bosd
  • Rebrand: New Logo (#38) @bosd
  • add iter() for TableList to support enumerate() (#13) @stonyw
  • Add support for parsing PDF pages in parallel (multiprocessing) (#17) @phoewass

🪲 Fixes

  • Bugfix for Stream._group_rows (#19) @ollynowell
  • [IMP] Improve poppler subprocess security (#153) @bosd
  • [FIX]/[IMP] Windows Support (#149) @bosd
  • Fixes: IndexError while using split_text (#21) @snanda85
  • Poppler backend: search for pdftopng in current environment (#4) @orent

🐎 Performance

🚨 Testing

👷 Continuous Integration

📚 Documentation

  • Rebrand CLI to pypdf_table_extraction (#158) @bosd
  • [IMP] Update readthedocs url (#59) @bosd
  • [IMP][UPD] Documentation Big Bang (#88) @bosd
  • [UPD] Update version, name to new repo (#65) @bosd
  • [MRG] Delete FUNDING.yml (#34) @bosd
  • Update Documentation Syntax (#51) @bosd
  • [MRG] Update CODE_OF_CONDUCT.md: Repo link (#33) @bosd

🔨 Refactoring

💄 Style

📦 Dependencies