-
Notifications
You must be signed in to change notification settings - Fork 485
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
parsers: lines: support multiple occurrence of blocks to parse
So far lines parser was looking for only 1 block defined by "start" and "end" RegEx-es. Some invoices may have lines of the same set in muliple blocks. They can be separated by some random content or page footer & header. To support such cases use "start" and "end" to find as many blocks to parse as possible. This is (hopefully) cleanly implemented by: 1. Renaming parse() to parse_block() and making it work with a single block (already extracted from invoice content) 2. Making new parse() find blocks one by one This feature has been requested as a way of dealing with some multi-page invoices. Signed-off-by: Rafał Miłecki <[email protected]>
- Loading branch information
Showing
4 changed files
with
124 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
[ | ||
{ | ||
"issuer": "Lines Tests", | ||
"date": "2022-10-15", | ||
"invoice_number": "1234/10/2022", | ||
"amount": 99.99, | ||
"lines": [ | ||
{ "pos": 1, "name": "Cat" }, | ||
{ "pos": 2, "name": "Dog" }, | ||
{ "pos": 3, "name": "Frog" }, | ||
{ "pos": 4, "name": "Lizard" }, | ||
{ "pos": 5, "name": "Unicorn" } | ||
], | ||
"currency": "EUR", | ||
"desc": "Invoice from Lines Tests" | ||
} | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
Issue date: 2022-10-15 | ||
Issuer: Lines Tests | ||
Invoice number: 1234/10/2022 | ||
Total: 99.99 EUR | ||
|
||
Lines in multiple blocks | ||
|
||
Lines start | ||
1. Cat | ||
2. Dog | ||
Lines end | ||
|
||
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus quis metus sagittis, fermentum | ||
risus et, vulputate orci. Curabitur id pellentesque mi, vel euismod nulla. Morbi tincidunt ipsum | ||
eu volutpat dictum. Nam hendrerit varius mauris, a venenatis ligula lacinia et. Sed blandit | ||
lobortis facilisis. Donec efficitur metus ac sapien luctus, eget facilisis dolor eleifend. In sapien | ||
erat, vestibulum in sollicitudin a, euismod nec nunc. | ||
|
||
Lines start | ||
3. Frog | ||
Lines end | ||
|
||
Nulla elit dui, dictum in augue ac, rutrum mollis risus. In hac habitasse platea dictumst. Phasellus | ||
quis eros ac elit iaculis vehicula et vel nunc. Aenean consequat in velit vel luctus. Proin vel | ||
sapien cursus, ultrices turpis vel, fringilla dolor. Vestibulum ex leo, ullamcorper a quam quis, | ||
molestie convallis est. Nulla egestas posuere purus, eget viverra elit dapibus et. Pellentesque | ||
habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Duis posuere eros | ||
dui. | ||
|
||
Lines start | ||
4. Lizard | ||
5. Unicorn | ||
Lines end | ||
|
||
In varius nulla arcu, ac interdum velit ornare vel. Mauris a placerat lacus. Nam porta metus eget | ||
arcu mattis, non iaculis elit luctus. Etiam rutrum volutpat arcu, vitae semper turpis mollis id. | ||
Fusce orci dui, pellentesque et ipsum eget, pellentesque luctus leo. Nullam non mollis mi. In | ||
semper, ex sed mollis dapibus, lectus metus vestibulum turpis, vitae convallis mauris eros in orci. | ||
Interdum et malesuada fames ac ante ipsum primis in faucibus. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# -*- coding: utf-8 -*- | ||
# SPDX-License-Identifier: MIT | ||
issuer: Lines Tests | ||
keywords: | ||
- Lines Tests | ||
- Lines in multiple blocks | ||
fields: | ||
date: | ||
parser: regex | ||
regex: Issue date:\s*(\d{4}-\d{2}-\d{2}) | ||
type: date | ||
invoice_number: | ||
parser: regex | ||
regex: Invoice number:\s*([\d/]+) | ||
amount: | ||
parser: regex | ||
regex: Total:\s*(\d+\.\d\d) | ||
type: float | ||
lines: | ||
parser: lines | ||
start: Lines start | ||
end: Lines end | ||
line: ^(?P<pos>\d+)\.\s+(?P<name>.+)$ | ||
types: | ||
pos: int | ||
options: | ||
currency: EUR | ||
date_formats: | ||
- '%Y-%m-%d' | ||
decimal_separator: '.' |