html - Parsec ignore everything except one fragment -


i need parse single select tag in poorly formed html document (so xml-based parsers don't work).

i think know how use parsec parse select tag once there, how skip stuff before , after tag?

example:

<html>    random content lots of tags...    <select id=something title="whatever"><option value=1 selected>1. first<option value=2>2. second</select>    more random content... </html> 

that's html looks in select tag. how parsec, or recommend use different library?

here's how i'd it:

solution = (do {   ; string "<tag-name"   ; x <- ⟦insertoptionsparserhere⟧   ; char '>'   ; return x   }) <|> (anychar >> solution) 

this recursively consume characters until meets starting <html> tag, upon uses parser, , leaves recursion on consuming final tag.

it wise note there may trailing whitespace before & after fix that, this, providing parser consumes tags:

solution = ⟦inserthtmlparserhere⟧ <|> (anychar >> solution) 

to clear mean ⟦inserthtmlparserhere⟧ have kind of structure:

⟦inserthtmlparserhere⟧ =    string "<tag-name"    ⋯    char '>' 

as side-note, if want capture every tag available, can quite happily use many:

everytag = many solution 

Popular posts from this blog