binning - converting activity start and end time into binned data for multiple groups in R dplyr tidyr -


i have data looks this:

foo <- data.frame(userid = c("a","a","b","b","b"),                   activity = factor(c("x","y","z","z","x")),                   st=c(0, 20,   0, 10, 25), # start time                   et=c(20, 30, 10, 25, 30)) # end time 

and want, each user, convert activity data 5 minute time bins. result this:

result <- data.frame(userid = c("a", "b"),                          x1 = c("x", "z"),                          x2 = c("x", "z"),                          x3 = c("x", "z"),                          x4 = c("x", "z"),                          x5 = c("y", "z"),                          x6 = c("y", "x")) 

the following approach works, quite cumbersome , slow. takes 15 minutes on modest-sized dataset.

library(dplyr) library(tidyr)  lvls <- levels(foo$activity)  time_bin <- function(st, et, act) {   bins <- seq(0, 30, by=5)   tb <- as.integer(bins>=st & bins<et)*as.integer(act)   tb[tb>0] <- lvls[tb]   data.frame(tb=tb, bins=bins) }  new_foo <-    foo %>%    rowwise() %>%   do(data.frame(., time_bin(.$st, .$et, .$activity))) %>%   select(-(activity:et)) %>%   group_by(userid) %>%   subset(tb>0) %>%   spread(bins, tb) 

is there faster or more convenient way of going this?

you can try:

library(data.table) library(reshape2)  dt = setdt(foo)[,seq(min(st)+5,max(et),5),.(userid,activity)] dcast(dt, userid~v1, value.var='activity') #  userid 5 10 15 20 25 30 #1      x  x  x  x  y  y #2      b z  z  z  z  z  x 

Popular posts from this blog