binning - converting activity start and end time into binned data for multiple groups in R dplyr tidyr -
i have data looks this:
foo <- data.frame(userid = c("a","a","b","b","b"), activity = factor(c("x","y","z","z","x")), st=c(0, 20, 0, 10, 25), # start time et=c(20, 30, 10, 25, 30)) # end time
and want, each user, convert activity data 5 minute time bins. result this:
result <- data.frame(userid = c("a", "b"), x1 = c("x", "z"), x2 = c("x", "z"), x3 = c("x", "z"), x4 = c("x", "z"), x5 = c("y", "z"), x6 = c("y", "x"))
the following approach works, quite cumbersome , slow. takes 15 minutes on modest-sized dataset.
library(dplyr) library(tidyr) lvls <- levels(foo$activity) time_bin <- function(st, et, act) { bins <- seq(0, 30, by=5) tb <- as.integer(bins>=st & bins<et)*as.integer(act) tb[tb>0] <- lvls[tb] data.frame(tb=tb, bins=bins) } new_foo <- foo %>% rowwise() %>% do(data.frame(., time_bin(.$st, .$et, .$activity))) %>% select(-(activity:et)) %>% group_by(userid) %>% subset(tb>0) %>% spread(bins, tb)
is there faster or more convenient way of going this?
you can try:
library(data.table) library(reshape2) dt = setdt(foo)[,seq(min(st)+5,max(et),5),.(userid,activity)] dcast(dt, userid~v1, value.var='activity') # userid 5 10 15 20 25 30 #1 x x x x y y #2 b z z z z z x