r - dplyr join and keeping variable obs without NA -


i have for loop allocates portfolios based on tdata$me , 10% quantile. issue i'm having when run for loop, end have last observation year allocated portfolios. loop through years, idea place portfolio allocations portf , join larger dataset.

my question how can join 2 data sets without placing na in other unknown obs , instead keeps obs is?

also, there better way run for loop dplyr? seems inefficient way allocate portfolios, couldn't think of way.

reproducible example :

tdata <- structure(list(cusip = c(47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l,      47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l),      fyear = c(1970l, 1970l, 1970l, 1970l, 1970l, 1970l, 1970l,      1970l, 1970l, 1970l, 1970l, 1970l, 1971l, 1971l, 1971l, 1971l,      1971l, 1971l, 1971l, 1971l), me = c(157,115,  45,  19, 132,  21, 147,     191,  80, 165,  32, 100,  44, 134, 104,9, 183, 163, 109,  88), month = c(6l, 6l, 6l, 6l, 6l,      6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l,      8l)), .names = c("cusip", "fyear", "me", "month"), row.names = c(na,      20l), class = "data.frame")  for(i in unique(tdata$fyear)){   check <- filter(tdata, month == 06 & fyear == i)                         ###   per <- quantile(check$me, c(.10, .20, .30, .40, .50, .60, .70, .80, .90))   check$portf[check$me < per[[1]]] <- "a"   check$portf[check$me >= per[[1]] & check$me < per[[2]]] <- "b"   check$portf[check$me >= per[[2]] & check$me < per[[3]]] <- "c"   check$portf[check$me >= per[[3]] & check$me < per[[4]]] <- "d"   check$portf[check$me >= per[[4]] & check$me < per[[5]]] <- "e"   check$portf[check$me >= per[[5]] & check$me < per[[6]]] <- "f"   check$portf[check$me >= per[[6]] & check$me < per[[7]]] <- "g"   check$portf[check$me >= per[[7]] & check$me < per[[8]]] <- "h"   check$portf[check$me >= per[[8]] & check$me < per[[9]]] <- "i"   check$portf[check$me >= per[[9]]]  <- "j"   check <- select(check, cusip, fyear, portf)   testcrsp <- left_join(tdata, check, = c("cusip", "fyear"))  ###### } 

update:

a dplyr solution remove for loop. note removed length() part because i'm not sure how on dplyr without repeating breaks code. result different returns dataframe months==6 instead of data na unselected months.

tdata3 <- tdata %>% group_by(fyear) %>%   filter(month==6) %>%    mutate(portf = cut(me, labels=letters[1:10], include.lowest=true, breaks=(me %>% quantile(seq(0, 1, by=0.1)) %>% unique)) %>%         as.character) %>% ungroup 

original:

here's think wanted. doesn't use dplyr since don't need subset , loop through years. use cut split me column quantiles factors of letters.

tdata2 <- tdata (i in unique(tdata$fyear)) {   thisyear <- tdata[tdata$fyear==i & tdata$month==6,]   per <- unique(quantile(thisyear$me, seq(0, 1, by=0.1)))    factors <- cut(thisyear$me, breaks=per, labels=letters[1:(length(per)-1)], include.lowest=true)     tdata2$portf[tdata$fyear==i & tdata$month==6] <- as.character(factors) }  tdata2 #    cusip fyear  me month portf # 1     47  1970 157     6     # 2     47  1970 115     6     f # 3     47  1970  45     6     c # 4     47  1970  19     6     # 5     47  1970 132     6     g # 6     47  1970  21     6     # 7     47  1970 147     6     h # 8     47  1970 191     6     j # 9     47  1970  80     6     d # 10    47  1970 165     6     j # 11    47  1970  32     6     b # 12    47  1970 100     6     e # 13    47  1971  44     6     b # 14    47  1971 134     6     g # 15    47  1971 104     6     d # 16    47  1971   9     6     # 17    47  1971 183     6     j # 18    47  1971 163     6     # 19    47  1971 109     6     e # 20    47  1971  88     8  <na>  # 0%   10%   20%   30%   40%   50%   60%   70%   80%   90%  100%  #        b     c     d     e     f     g     h         j     

note unique has used in quantiles because (it happened before edited data) have equal quantiles won't accepted breaks factors. because of should use length(per) instead if directly typing 1:10.


Popular posts from this blog