java - Fastest/Easiest Way to Parse Text -


i trying parse text looking fastest/easiest solution so. i've tried using regex, taking forever java...

here's structure of text trying parse:

***************** id: 1234567  // 7 digit uuid mistakes: there may mistakes here, or there may not mistakes  //optional mistake type: mistake background // "yes" or "no" report: <xml><item>blah, blah</item></xml>   ***************** 

in reality file might such:

***************** id: 1234567 mistakes:  no: happened on playground report: <xml><item>black eye when playing basketball</item><reason>elbow</reason></xml>   *****************  ***************** id: 1234568 mistakes: teacher not watching students @ time of incident yes: teacher turned after seeing altercation report: <xml><item>fight</item><reason>none</reason></xml>   *****************  ***************** id: 1234569 mistakes: no report: <xml><item>child needed band-aid</item><reason>scrape</reason></xml>   *****************  ***************** id: 1234570 mistakes: no report: <xml><item>child needed tissue</item><reason>runny nose</reason></xml>   ***************** ... ... 

i trying put 'keys' (id, mistakes, mistake type, report), map further aggregation , processing.

i've tried using regex , had minimal success client keeps changing report structure , throws entire pattern off. looking might little easier maintain. in past, i've had easy time xsl transforms on data this, isn't pure xml , don't know if java throw or not given current format. i've asked client if they're willing change format, they're not interested in doing that.

anyone have thoughts on how make parsing easier maintain?

thanks!


edit:

i don't have regex me, here's gist of it

id:\s*(\\d{7}).*mistakes:\s*(\\d*).*mistake type:\s*(\\d*).*report:\s*(.*)

i cannot comment on posts yet, that's why leave answer. if have fixed number of fields read line line , write 6 lines after line startswith("^\.*"). proceed storing them in map using similar logic. if startswith("^mistakes:") store whatever left after stripping ":mistakes" value.


Popular posts from this blog