-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add posibility to parse multiple line definitions #378
Conversation
bosd refactored code to allow multiple lines to be parsed. Restructuring was needed. As we want to output the lines in the order of the invoice file. Independent of the order of the template file. Example yaml input lines: - start: Barcode end: Netto totaal line: (?P<line_note>(FOOD)) - start: Barcode end: Netto totaal line: (?P<description>(.+))\s+(?P<price>\d+\d+) it is also possible to contentate multi line tags. alternative example sammymaystone. lines: - start: Item\s+Quantity\s+Rate\s+Amount end: Subtotal first_line: 'Service (?P<item>\w)\s+(?P<qty>\S+)\s+.(?P<unitprice>\d+.\d{2})\s+.?(?P<linetotal>\d+.\d{2})' line: '(?P<item>.*)' skip_line: ['Pino \w'] # 'Description:', Notes:, last_line: '(?P<desc>Parts:.*)' types: qty: int unitprice: float linetotal: float input: Service A 12 0.00 20.00 Description: Repair Notes: Replaced capacitor Parts: 1 x cap_a Tax: 0.2% output: {'item': 'A\nDescription: Repair\nNotes: Replaced capacitor', 'qty': 12, 'unitprice': 10.0, 'linetotal': 120.0, 'desc': 'Parts: 1 x cap_a'}
Thanks for the contribution! Will test this locally first. |
Test should pass. Linting in this case. |
Flake8 was not willing to install on my local system. So fixed the errors manually. Could'nt test, but flake8 should be happy now. |
@m3nu Should be fixed now, can you trigger the tests? |
I'm quite a bit confused by this pull request. I'm not sure what does it do and what are you trying to fix/improve.
Please start with explaining your case.
That example you provided ("Service A") could be parsed without this pull request so it doesn't really help to understand your changes. |
This contribution allow to parse all the invoice lines against different regexes. I'm sorry, I was unable to do it in a smaller commit. Invoice2data is used as a module for the importing of invoices in Odoo.
Let's call that for now a section. That section consists of multiple lines. With the alternative syntax, introduced by #308 (what you call new) it was not possible to create the desired output. In the "old" syntax all the line info was captured in the
it reads (note the line_items ):
The added flexibility may look as an improvement on first hand. Sorry, getting a bit late at night typing this. I don't fully recall how the response is with your "new" syntax and contentating.
To add we want the lines output to be in the same order as they where found in the original input file. Didn't have the time to do it. But I really want to revert back the "new" syntax. Hope this is more clear. It's Getting to late now, will continue/improve this post later on 😪 💤 |
With this PR it is also possible to use multiple line groups in the configuration. Quick example of correct syntax for multiple line configurations.
|
Check running. Sorry for the delay. |
NP! The CI tests are failling. Gonna need some help and pointers why that's the case. Local tests are failling for different reasons: |
One is a failed test comparison. Maybe a different extraction result. The other is a missing variable. Probably a typo.
https://github.com/invoice-x/invoice2data/runs/7024293832?check_suite_focus=true |
local tests are passing, with some code changes. So it's a bit of trail and error. Hope you don't mind triggering the CI a couple of times. In case the latest commit doesn't fix it. |
@bosd: I feel you really don't like the new syntax, I'll have to argue a bit here.
While I did my best not to beak backward compatibility, you claim I did. Please create a new issue providing details of YAML that stopped working for you after my changes. It's really important for me.
I can't treat it as a serious argument against new syntax. It's about some unpublihsed software I can't even test. I can't imagine why it's so hard to generate a different YAML. Maybe your software needs some refactoring.
Exactly, it's open source, please suggest a better soltuion for problems I attempted to resolve (resolved?) with my new syntax. Just reverting is not a solution.
Invoice fields are not universally named. Line fields are not universally named. Yet you blame me for dropping one It feels annoying to have you blaming one feature (syntax) and keeping that semi off-topic discussion here. If at least you could suggest another solution. I wasn't the only person what wanted reusable |
You spent quite some time criticizing the new syntax but not so much explaining your changes. Can we take a step back and discuss what do we need to support, please? You provided a following example of input invoice:
and expected parsing output:
I took a moment to create a fake invoice with following content:
and wrote a simple YAML with following
parsing results in:
which is exactly what you expected. So no changes to Can you provide an actual example of invoice section that:
|
Sorry, Should've been more clear. It is not broken. yet..
Yes, I agree the problems are bigger. But can't solve them all at once :) More PR's are coming.
I've looked in into those issues. Yet they did not resolve it for me.
Yes, Will do Ideally I would like to add more "advanced" examples.
It's coming, just need more dev time 😃 |
Good point! Given how little maintainance it takes to keep old syntax support I think we should keep it as long as it doesn't cause serious development / maintenance problems.
Perfect. I think we must have a real example to understand why this change is needed and review it properly. For me it's a really bad idea to merge code that we don't know what it's supposed to solve. |
Ok, going back to the original issue in #377 When passing multiple line arguments in the template I won't get the desired outcome. Template (old syntax):
While running it against the code before this pr: ''' ''' Ok, that template was using the 'old' syntax.
Result:
It parses 🎉 Now, lets use the "new syntax" template with the code in this pr:
🎉 No Difference! Ok, new lets use the "old" syntax template with the code from this PR:
✨🥇 Now we have a full json representation of the lines from the input file. |
I'll try to review this this Sunday (Cc @m3nu) |
This one depends upon: #394 |
fff0526
to
5048cdf
Compare
bosd refactored code to allow multiple lines to be parsed. Restructuring was needed. As we want to output the lines in the order of the invoice file. Independent of the order of the template file. Example yaml input lines: - start: Barcode end: Netto totaal line: (?P<line_note>(FOOD)) - start: Barcode end: Netto totaal line: (?P<description>(.+))\s+(?P<price>\d+\d+) it is also possible to contentate multi line tags. alternative example sammymaystone. lines: - start: Item\s+Quantity\s+Rate\s+Amount end: Subtotal first_line: 'Service (?P<item>\w)\s+(?P<qty>\S+)\s+.(?P<unitprice>\d+.\d{2})\s+.?(?P<linetotal>\d+.\d{2})' line: '(?P<item>.*)' skip_line: ['Pino \w'] # 'Description:', Notes:, last_line: '(?P<desc>Parts:.*)' types: qty: int unitprice: float linetotal: float input: Service A 12 0.00 20.00 Description: Repair Notes: Replaced capacitor Parts: 1 x cap_a Tax: 0.2% output: {'item': 'A\nDescription: Repair\nNotes: Replaced capacitor', 'qty': 12, 'unitprice': 10.0, 'linetotal': 120.0, 'desc': 'Parts: 1 x cap_a'}
Local tests are ok. Odd that there is a difference in tests. |
Another test solved! ✨ |
This pull request rewrites a lot of @bosd I know you spent a lot of time on this but could you please consider my possibly an alternative soluion: #407 , please? My changes should implement the same feature by just adding 15 new lines to the |
True, it's quite a rewrite. It could even be further improved by dumping all the lines in an array. Possibly that would elliminate some of the for loops. (for loops are known for slowing python down) This one is ready! (Apart from the messy commits, propably easier to make a new branch and pr)
Will have a look at those PR's 😄 |
I've clearly missed some requirements, it seems we have a bit different cases after all. I'll spend some time analyzing this today and tomorrow and come back to you. Sorry for this confusion. |
Thank you @bosd for this complete explanation of your case. I carefully read that and I believe I finally understand it fully. By looking at your desired output it seems that you need to:
That syntax you proposed:
is quite confusing in my opinion. It's because you have there multiple (three) & repeated It's really hard for me to guess what should happen if you use It turns out it's already possible to parse your
(the trick is to use That may not be perfect but gives output close to what you expected:
I admit that using I think we should just allow multiple entries in the
|
I think you understand it correctly. The usage of the | as suggested here is interesting. Ooh, yeah, I see the multiple start and end's can be confusing. Kinda hard to put it into full spec.
(Contentate is not used in the above example) I'll have a look at the mentioned pr |
Superseded by #417 |
Supersedes #225
Fixes: #155 No warning with lines - Solved in a different manner to show why something matches and on which setting.
Fixes: #224 Support for multiple lines using lines plugin - With addition of line contentating.
Fixes: #186 Support for multiple line groups in a template
Fixes: #238 Is there a support for multiple regex for lines plugin?
Closes: #377 Parse Multiple lines in the same field
Refactored code to allow multiple lines to be parsed.
Restructuring was needed.
As we want to output the lines in the order of the invoice file.
Independent of the order of the template file.
Example yaml input
it is also possible to contentate multi line tags.
alternative example sammymaystone.
input:
output:
{'item': 'A\nDescription: Repair\nNotes: Replaced capacitor', 'qty': 12, 'unitprice': 10.0, 'linetotal': 120.0, 'desc': 'Parts: 1 x cap_a'}