Python read pdf in sections -


i'm trying read pdf file each page divided 3x3 blocks of information of form

a | b | c d | e | f g | h | 

each of entries broken multiple lines. simplified example of 1 entry this card. there similar cards in other 8 slots. i'd able read a, b, c…; however, survive if read first line of a, b, , c, , second line of a, b, , c, etc. i've looked @ pdfminer , pypdf, haven't seen fit i'm looking for. answer here works well, order of
columns routinely gets distorted.

in second answer here replace

self.rows = sorted(self.rows, key = lambda x: (x[0], -x[2])) 

by

self.rows = sorted(self.rows, key = lambda x: (x[0], -x[2], x[1])) 

very important: see last paragraph of answer.


Popular posts from this blog