regex for parsing html page by libcurl in c -


this question has answer here:

can give me regular expression in c parsing html page , extracting url links ?

the commenters have pointed obligatory link parsing general html regular expressions. same page has (not linked) answer says well-known subset can parsed regular expressions. i'm that.

if looking quick , dirty way list of hyperlinks in website, use

\<a [^>]*<href *= *"([^"]+)" 

which should give link of <a href="..."> tags first grouped sub-expression each match. but:

  • obviously, there no context, regex match links commented out or part of javascript string or part of javascript comment.
  • regexes come in (too) many flavours. regex above works if \< means literal left angle bracket , < means boundary @ beginning of word.
  • the regex requires href attribute in double qoutes.

proceed carefully.


Popular posts from this blog