regex for parsing html page by libcurl in c -
this question has answer here:
can give me regular expression in c parsing html page , extracting url links ?
the commenters have pointed obligatory link parsing general html regular expressions. same page has (not linked) answer says well-known subset can parsed regular expressions. i'm that.
if looking quick , dirty way list of hyperlinks in website, use
\<a [^>]*<href *= *"([^"]+)" which should give link of <a href="..."> tags first grouped sub-expression each match. but:
- obviously, there no context, regex match links commented out or part of javascript string or part of javascript comment.
- regexes come in (too) many flavours. regex above works if \<means literal left angle bracket ,<means boundary @ beginning of word.
- the regex requires hrefattribute in double qoutes.
proceed carefully.