r - extract alphanumeric strings from text -


background

related question not required reading

question

i have string

str_temp <- "{type: [{a: a1, timestamp: 1}, {a:a2, timestamp: 2}]}" 

from extract 7 alphanumeric substrings: type, a, a1, timestamp, a, a2, timestamp. however, can't regex work.

i have tried both base r , library(stringr) using various combinations of [:word:], [:alnum:], [:alpha:] etc.

one example:

> pattern <- "[:word:]" > str_locate_all(str_temp, pattern) [[1]]      start end [1,]     6   6 [2,]    11  11 [3,]    26  26 [4,]    34  34 [5,]    48  48 

but gives me end points of strings type, a, timestamp, a, timestamp , not start points, or either of a1 or a2.

what's correct regex extracting 7 alphanumeric strings?

here regex works. matches alphanumeric words not numbers.

((?![0-9]+)[a-za-z0-9]+) 

http://www.rubular.com/r/euf9afdtxw

thanks richard showing how use in r:

regmatches(str_temp, gregexpr("((?![0-9]+)[a-za-z0-9]+)", str_temp, perl = true))[[1l]] 

Popular posts from this blog