regex for parsing html page by libcurl in c -
this question has answer here:
can give me regular expression in c parsing html page , extracting url links ?
the commenters have pointed obligatory link parsing general html regular expressions. same page has (not linked) answer says well-known subset can parsed regular expressions. i'm that.
if looking quick , dirty way list of hyperlinks in website, use
\<a [^>]*<href *= *"([^"]+)"
which should give link of <a href="...">
tags first grouped sub-expression each match. but:
- obviously, there no context, regex match links commented out or part of javascript string or part of javascript comment.
- regexes come in (too) many flavours. regex above works if
\<
means literal left angle bracket ,<
means boundary @ beginning of word. - the regex requires
href
attribute in double qoutes.
proceed carefully.