xml - Assigning variable values programmatically during html parsing -


i expanding previous question html parsing include question blank values. suppose have empty values variables pulling html. there multiple variables empty, want systematic approach handling them (loop or function).

this question assigning variables programmatically, , of information have found suggests avoiding use of eval(parse(text, i'm not sure how replace in case. have following html:

html <-  '<!doctype html> <html>     <body>         <div class="foo">             <div class="fooname">name of 1st foo</div>             <div class="abc">abc value present here</div>             <span>1st span in 1st foo</span>             <span>2nd span in 1st foo</span>         </div>          <div class="foo">             <div class="fooname">name of 2nd foo</div>             <span>only 1 span in 2nd foo</span>         </div>     </body> </html>' 

here parsing:

library(xml)  html.parse <- htmlparse(html)  myfunc <- function(x){     fooname <- xpathsapply(x, "./div[@class='fooname']", fun = xmlvalue)     abc <- xpathsapply(x, "./div[@class='abc']", fun = xmlvalue)     span <- xpathsapply(x, "./span", fun = xmlvalue)      df <- data.frame(fooname, abc, span1 = span[1], span2 = span[2])     return(df) }  result <- getnodeset(html.parse, "//div[@class='foo']", fun = myfunc)  #  error in data.frame(fooname, abc, span1 = span[1], span2 = span[2]) :  #   arguments imply differing number of rows: 1, 0  

here attempted fix.

myfunc <- function(x){     fooname <- xpathsapply(x, "./div[@class='fooname']", fun = xmlvalue)     abc <- xpathsapply(x, "./div[@class='abc']", fun = xmlvalue)     span <- xpathsapply(x, "./span", fun = xmlvalue)       dfvars <- c("fooname", "abc", "span")      #i think have same issue assigning variable in `apply`         #functions, right?      for(var in dfvars) {          if(length(eval(parse(text = var))) == 0) {             cat("no ", var, " value found group.\n")              #note "list" class:             cat("class of ", var, " is: ", class(eval(parse(text = var))), "\n")             cat("placing na.\n")              #this line gives error:             assign(eval(parse(text = var)), as.character(na))              cat("new value of ", var, " : ", eval(parse(text = var)), "\n")             cat("new length of ", var, " : ", length(eval(parse(text = var))), "\n")             cat("new class of ", var, " : ", class(eval(parse(text = var))), "\n")          }     }      df <- data.frame(fooname, abc, span1 = span[1], span2 = span[2])     return(df) }  result <- getnodeset(html.parse, "//div[@class='foo']", fun = myfunc)  #  error in assign(eval(parse(text = var)), as.character(na)) :  #   invalid first argument  

note while here for loop (or apply function if way) in second nesting layer. in real project, it's in third; outer layer opens each in series of pages. avoid going third level if possible, want keep things straightforward.

you define own xpathsapply function tests list():

myxpathsapply <- function(x, ...){   y <- xpathsapply(x, ...)   if(length(y) > 0){y}else{na} } 

and use function use xpathsapply:

myfunc <- function(x){     fooname <- myxpathsapply(x, "./div[@class='fooname']", fun = xmlvalue)     abc <- myxpathsapply(x, "./div[@class='abc']", fun = xmlvalue)     span <- myxpathsapply(x, "./span", fun = xmlvalue)      df <- data.frame(fooname, abc, span1 = span[1], span2 = span[2])     return(df) } 

Popular posts from this blog