Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table detection #27

Open
fl4p opened this issue Sep 27, 2024 · 2 comments
Open

Table detection #27

fl4p opened this issue Sep 27, 2024 · 2 comments

Comments

@fl4p
Copy link
Owner

fl4p commented Sep 27, 2024

table2matrix

Datasheets contain merged cells if a unit or condition applies to multiple rows. headers might also be merged. when iterating the data row wise, we need to first resolve the merged cells and copy the value across all rows within the span.

  +-----------+---+---+---+
  | A         | B | C | D |
  |           +---+---+---+
  |           | E         |
  |           +---+---+---+
  |           | E | C | C |
  +---+---+---+---+---+---+
  | E | C | C | C | C | C |
  +---+---+---+---+---+---+

              V

[['A', 'A', 'A', 'B', 'C', 'D'],
 ['A', 'A', 'A', 'E', 'E', 'E'],
 ['A', 'A', 'A', 'E', 'C', 'C'],
 ['E', 'C', 'C', 'C', 'C', 'C']]

Tabula

  • the web version has an auto-detect function, which performs much better than tabula-java (CLI)
  • it does not auto-detect nested tables
  • selecting the whole page leads to poor results
EPC2306

EPC2306

image

Stream

(FIELD* headers are actually empty, the CSV->MD converter put them)

Table
FIELD1 FIELD2 Dynamic Characteristics# (TJ = 25°C unless otherwise stated) FIELD4 FIELD5 FIELD6 FIELD7
PARAMETER TEST CONDITIONS MIN TYP MAX UNIT
CISS Input Capacitance 1777 2369
CRSS Reverse Transfer Capacitance VDS = 50 V, VGS = 0 V 5.8
COSS Output Capacitance 616 803 pF
COSS(ER) Effective Output Capacitance, Energy Related (Note 1) 730
VDS = 0 to 50 V, VGS = 0 VCOSS(TR) Effective Output Capacitance, Time Related (Note 2) 882
RG Gate Resistance 0.4 Ω
QG Total Gate Charge VDS = 50 V, VGS = 5 V, ID = 25 A 12.3 16.2
QGS Gate to Source Charge 4.3
QGD Gate-to-Drain Charge VDS = 50 V, ID = 25 A 1.1
nC
QG(TH) Gate Charge at Threshold 3.1
QOSS Output Charge VDS = 50 V, VGS = 0 V 44 57
QRR Source-Drain Recovery Charge 0

Latice

  • headers (min) are off
Table
Dynamic Characteristics# (TJ 25°C unless otherwise stated) FIELD3 FIELD4 FIELD5 FIELD6 FIELD7
PARAMETER TEST CONDITIONS MIN TYP MAX UNIT
CISS Input Capacitance VDS = 50 V, VGS = 0 V 1777 2369 pF
CRSS Reverse Transfer Capacitance 5.8
COSS Output Capacitance 616 803
COSS(ER) Effective Output Capacitance, Energy Related (Note 1) VDS = 0 to 50 V, VGS = 0 V 730
COSS(TR) Effective Output Capacitance, Time Related (Note 2) 882
RG Gate Resistance 0.4 Ω
QG Total Gate Charge VDS = 50 V, VGS = 5 V, ID = 25 A 12.3 16.2 nC
QGS Gate to Source Charge VDS = 50 V, ID = 25 A 4.3
QGD Gate-to-Drain Charge 1.1
QG(TH) Gate Charge at Threshold 3.1
QOSS Output Charge VDS = 50 V, VGS = 0 V 44 57
QRR Source-Drain Recovery Charge 0

Findings

  • stream extraction method appears to be more usable here
  • it doesn't provide an output that supports merged cells?
  • the JSON output contains raw cell coordinates. cells are already grouped in rows. a cell with a rowspan (merged vertically) occurs in the first row. all subsequent rows within the span (and the same column) will have empty content. we can easily fill these rows with the same value
  • auto-detect does not find nested tables and output quality suffers when selecting the whole page
  • when we know the table bbox, the output can be good

BSB028N06NN3GXUMA2.pdf

image

tabula stream

Parameter Symbol Conditions Values Unit
min. typ. max.
Dynamic characteristics
Input capacitance C iss - 8800 12000 pF
V GS=0 V, V DS=30 V,
Output capacitance C oss - 2100 2800
f =1 MHz
Reverse transfer capacitance Crss - 64 -
Turn-on delay time t d(on) - 21 - ns
Rise time t r V DD=30 V, V GS=10 V, - 9 -
I =30 A, R
Turn-off delay time t d(off) D G,ext=1.6 W - 38 -
Fall time t f - 6 -
  • results are good. it even uses the column headers from the "previous" table (min/typ/max)

pix2image

image

  • for the first 3 rows it detects a single rowspan=3 , across all columns. this could be simplified. still it its not a usable result.
@fl4p
Copy link
Owner Author

fl4p commented Sep 27, 2024

pix2text

p2t predict -l en --resized-shape 2048 --file-type pdf -i datasheets/epc/EPC2306.pdf -o epc2306.md \
    --save-debug-res output-debug-p2t 

9-TABLE

  • it supports merged cells
  • it appears to ignore lines mostly and put a new table grid from text boundaries
MD table (no merged cells)
PARAMETER TEST CONDITIONS MIN TYP MAX UNIT
CIss Input Capacitance Vos=50V,Vcs=0V 1777 2369 pF
Cass Reverse Transfer Capacitance 5.8
Coss Output Capacitance 616 803
CoSSER Effective Output Capacitance, Energy Related (Note 1) Vos=0to 50V,VGs=0V 730
CossTR Effective Output Capacitance, Time Related (Note 2) 882
Re Gate Resistance 0.4 Q
QG Total Gate Charge Vps=50V,Vcs=5V,lb=25A 12.3 16.2 nC
QGs Gate to Source Charge Vps=50V,lp=25 A 4.3
QG Gate-to-Drain Charge 1.1
QGirn) Gate Charge at Threshold 3.1
Qoss Output Charge Vps=50V,Vcs=0V 44 57
QRR Source-Drain Recovery Charge 0

HTML

p2t = Pix2Text.from_config()
doc = p2t.recognize_pdf('../datasheets/EPC/EPC2306.pdf', page_numbers=[1], resized_shape=2048)
table = doc.pages[0].elements[9]
print(table.meta['html'][0])
PARAMETERTEST CONDITIONSMINTYPMAXUNIT
CIssInput CapacitanceVos=50V,Vcs=0V17772369pF
CassReverse Transfer Capacitance5.8
CossOutput Capacitance616803
CoSSEREffective Output Capacitance, Energy Related (Note 1)Vos=0to 50V,VGs=0V730
CossTREffective Output Capacitance, Time Related (Note 2)882
ReGate Resistance0.4Q
QGTotal Gate ChargeVps=50V,Vcs=5V,lb=25A12.316.2nC
QGsGate to Source ChargeVps=50V,lp=25 A4.3
QGGate-to-Drain Charge1.1
QGirn)Gate Charge at Threshold3.1
QossOutput ChargeVps=50V,Vcs=0V4457
QRRSource-Drain Recovery Charge0

Another example

1-TABLE

HTML and MD tables
ParametersymbolValuesUnteNote I Test Condition
Min.Typ.Max.
Drain-source breakdown voltageV(BR)DSS100--VVes=0V, Io=1 mA
Gate threshold voltageVesth2.23.03.8VVos=Ves, /D=72 uA
Zero gate voltage drain currentls:0.1 101 100uAVos=100V, Ves=0 V, T=25°0 Vps=100 V, Ves=0 V, Tj=125°0
Gate-source leakage currentless-10100nAVes=20 V,Vos=0\V
Drain-source on-state resistanceRosom:4.3 5.35.0 7.1m2Ves=10 V,D=50A Ves=6V,D=25 A
Gate resistance)Re-1.21.8Q-
TransconductanceOfs50100-S|Vos|>2|/p|Ros(on)max,|b=50 A
Parameter symbol Values Unte Note I Test Condition Min. Typ. Max.
Drain-source breakdown voltage V(BR)DSS 100 - - V Ves=0V, Io=1 mA
Gate threshold voltage Vesth 2.2 3.0 3.8 V Vos=Ves, /D=72 uA
Zero gate voltage drain current ls : 0.1 10 1 100 uA Vos=100V, Ves=0 V, T=25°0 Vps=100 V, Ves=0 V, Tj=125°0
Gate-source leakage current less - 10 100 nA Ves=20 V,Vos=0\V
Drain-source on-state resistance Rosom : 4.3 5.3 5.0 7.1 m2 Ves=10 V,D=50A Ves=6V,D=25 A
Gate resistance) Re - 1.2 1.8 Q -
Transconductance Ofs 50 100 - S Vos

@fl4p
Copy link
Owner Author

fl4p commented Sep 27, 2024

img2table

  • looks promising
  • pure OCR solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant