r - dplyr join and keeping variable obs without NA -
i have for
loop allocates portfolios based on tdata$me
, 10% quantile. issue i'm having when run for
loop, end have last observation year allocated portfolios. loop through years, idea place portfolio allocations portf
, join larger dataset.
my question how can join 2 data sets without placing na
in other unknown obs , instead keeps obs is?
also, there better way run for
loop dplyr
? seems inefficient way allocate portfolios, couldn't think of way.
reproducible example :
tdata <- structure(list(cusip = c(47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l, 47l), fyear = c(1970l, 1970l, 1970l, 1970l, 1970l, 1970l, 1970l, 1970l, 1970l, 1970l, 1970l, 1970l, 1971l, 1971l, 1971l, 1971l, 1971l, 1971l, 1971l, 1971l), me = c(157,115, 45, 19, 132, 21, 147, 191, 80, 165, 32, 100, 44, 134, 104,9, 183, 163, 109, 88), month = c(6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 6l, 8l)), .names = c("cusip", "fyear", "me", "month"), row.names = c(na, 20l), class = "data.frame") for(i in unique(tdata$fyear)){ check <- filter(tdata, month == 06 & fyear == i) ### per <- quantile(check$me, c(.10, .20, .30, .40, .50, .60, .70, .80, .90)) check$portf[check$me < per[[1]]] <- "a" check$portf[check$me >= per[[1]] & check$me < per[[2]]] <- "b" check$portf[check$me >= per[[2]] & check$me < per[[3]]] <- "c" check$portf[check$me >= per[[3]] & check$me < per[[4]]] <- "d" check$portf[check$me >= per[[4]] & check$me < per[[5]]] <- "e" check$portf[check$me >= per[[5]] & check$me < per[[6]]] <- "f" check$portf[check$me >= per[[6]] & check$me < per[[7]]] <- "g" check$portf[check$me >= per[[7]] & check$me < per[[8]]] <- "h" check$portf[check$me >= per[[8]] & check$me < per[[9]]] <- "i" check$portf[check$me >= per[[9]]] <- "j" check <- select(check, cusip, fyear, portf) testcrsp <- left_join(tdata, check, = c("cusip", "fyear")) ###### }
update:
a dplyr
solution remove for
loop. note removed length()
part because i'm not sure how on dplyr
without repeating breaks
code. result different returns dataframe months==6
instead of data na
unselected months.
tdata3 <- tdata %>% group_by(fyear) %>% filter(month==6) %>% mutate(portf = cut(me, labels=letters[1:10], include.lowest=true, breaks=(me %>% quantile(seq(0, 1, by=0.1)) %>% unique)) %>% as.character) %>% ungroup
original:
here's think wanted. doesn't use dplyr
since don't need subset , loop through years. use cut
split me
column quantiles factors of letters.
tdata2 <- tdata (i in unique(tdata$fyear)) { thisyear <- tdata[tdata$fyear==i & tdata$month==6,] per <- unique(quantile(thisyear$me, seq(0, 1, by=0.1))) factors <- cut(thisyear$me, breaks=per, labels=letters[1:(length(per)-1)], include.lowest=true) tdata2$portf[tdata$fyear==i & tdata$month==6] <- as.character(factors) } tdata2 # cusip fyear me month portf # 1 47 1970 157 6 # 2 47 1970 115 6 f # 3 47 1970 45 6 c # 4 47 1970 19 6 # 5 47 1970 132 6 g # 6 47 1970 21 6 # 7 47 1970 147 6 h # 8 47 1970 191 6 j # 9 47 1970 80 6 d # 10 47 1970 165 6 j # 11 47 1970 32 6 b # 12 47 1970 100 6 e # 13 47 1971 44 6 b # 14 47 1971 134 6 g # 15 47 1971 104 6 d # 16 47 1971 9 6 # 17 47 1971 183 6 j # 18 47 1971 163 6 # 19 47 1971 109 6 e # 20 47 1971 88 8 <na> # 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% # b c d e f g h j
note unique has used in quantiles because (it happened before edited data) have equal quantiles won't accepted breaks
factors. because of should use length(per)
instead if directly typing 1:10
.