html - Parsec ignore everything except one fragment -
i need parse single select tag in poorly formed html document (so xml-based parsers don't work).
i think know how use parsec parse select tag once there, how skip stuff before , after tag?
example:
<html> random content lots of tags... <select id=something title="whatever"><option value=1 selected>1. first<option value=2>2. second</select> more random content... </html>
that's html looks in select tag. how parsec, or recommend use different library?
here's how i'd it:
solution = (do { ; string "<tag-name" ; x <- ⟦insertoptionsparserhere⟧ ; char '>' ; return x }) <|> (anychar >> solution)
this recursively consume characters until meets starting <html>
tag, upon uses parser, , leaves recursion on consuming final tag.
it wise note there may trailing whitespace before & after fix that, this, providing parser consumes tags:
solution = ⟦inserthtmlparserhere⟧ <|> (anychar >> solution)
to clear mean ⟦inserthtmlparserhere⟧
have kind of structure:
⟦inserthtmlparserhere⟧ = string "<tag-name" ⋯ char '>'
as side-note, if want capture every tag available, can quite happily use many
:
everytag = many solution