Python read pdf in sections -
i'm trying read pdf file each page divided 3x3 blocks of information of form
a | b | c d | e | f g | h |
each of entries broken multiple lines. simplified example of 1 entry this card. there similar cards in other 8 slots. i'd able read a, b, c…; however, survive if read first line of a, b, , c, , second line of a, b, , c, etc. i've looked @ pdfminer , pypdf, haven't seen fit i'm looking for. answer here works well, order of
columns routinely gets distorted.
in second answer here replace
self.rows = sorted(self.rows, key = lambda x: (x[0], -x[2]))
by
self.rows = sorted(self.rows, key = lambda x: (x[0], -x[2], x[1]))
very important: see last paragraph of answer.