r - How to quantify overlap between three periods? -
i writing function calculate duration of overlap between 3 periods, having trouble in finding out how efficiently program this, can me out.
i have dataset of people have been followed on time. starting date, , time spent in study differs between participants. each participant, calculate how many days in study in specific year , in 5-year age category was. example, if in study 01-01-2000 01-06-2001, , born on 15-06-1965, contribute 166 days 30-34 year age category in 2000, 200 days in 35-39 year age category in 2000 , 151 days in 35-39 year age category in 2001, while spent 0 days in other categories.
in other words: quantify overlap between these periods:
a = entering study ending study (varies among participants, fixed value within participant)
b = begin specific year end specific year (same among participants, varies within participant)
c = entering specific 5-yr age category exiting specific 5-yr age category (varies among participants, varies within participant)
my data looks this:
dat <- data.frame(lapply( data.frame("birth"=c("1965-06-15","1960-02-01","1952-05-02"), "begin"=c("2000-01-01","2003-08-14","2007-12-05"), "end"=c("2001-06-01","2006-10-24","2012-03-01")),as.date))
thus far, came this, not know how proceed (or whether should take totally different approach)…
spec.fu <- function(years,birth,begin,end,age.cat,data){ birth <- data[,birth] start.a <- data[,begin] end.a <- data[,end] (i in years){ start.b <- as.date(paste(i,"01-01",sep="-")) end.b <- as.date(paste(i+1,"01-01",sep="-")) (j in age.cat){ start.c <- paste((as.numeric(format(birth, "%y"))+j), format(birth,"%m-%d"), sep="-") end.c <- paste((as.numeric(format(birth, "%y"))+j+5), format(birth,"%m-%d"), sep="-") result <- ????? data[,ncol(data)+?????] <- result colnames(data)[ncol(data)+?????] <- paste("fu",j,"in",i,sep="") } } return(data) }
and use this:
newdata <- spec.fu(years=2000:2001,birth="birth",begin="begin", end="end",age.cat=seq(30,35,5),data=dat)
so, in case, want make 2 (no. of age categories) * 2 (no. of years) = 4 new columns each participant, each containing no. of days has spent in study in specific category (e.g. in age category 30-34 in 2001).
hopefully able explain problem.
many in advance!
i found solution (see below). code looks rather cumbersome though, can made more efficient. advise welcome!
spec.fu <- function(years,birth,begin,end,age.cat,data){ birth <- data[,birth] start.a <- data[,begin] end.a <- data[,end] if (any(sapply(c(birth,start.a,end.a),fun=function(x) class(x)!="date"))) { stop("'birth', 'begin' , 'end' must of class 'date''") } # ifelse-function saves date class in vectors # (http://stackoverflow.com/questions/6668963) safe.ifelse <- function(cond, yes, no) { structure(ifelse(cond, yes, no), class = class(yes))} (i in years){ start.b <- rep(as.date(paste(i,"01-01",sep="-")),nrow(data)) end.b <- rep(as.date(paste(i+1,"01-01",sep="-")),nrow(data)) start.ab <- safe.ifelse((start.a <= end.b & start.b <= end.a) & start.a >= start.b, start.a, safe.ifelse((start.a <= end.b & start.b <= end.a) & start.b >= start.a, start.b, as.date("1000-01-01"))) #in latter case overlap zero, date required later on end.ab <- safe.ifelse((start.a <= end.b & start.b <= end.a) & end.a <= end.b, end.a, safe.ifelse((start.a <= end.b & start.b <= end.a) & end.b <= end.a, end.b, as.date("1000-01-01"))) (j in age.cat){ start.c <- safe.ifelse(format(birth,"%m")=="02" & format(birth, "%d")=="29", as.date(paste((as.numeric(format(birth, "%y"))+j),format(birth,"%m"), "28", sep="-")), as.date(paste((as.numeric(format(birth, "%y"))+j), format(birth,"%m-%d"), sep="-"))) end.c <- safe.ifelse(format(birth,"%m")=="02" & format(birth, "%d")=="29", as.date(paste((as.numeric(format(birth, "%y"))+j+5),format(birth,"%m"), "28", sep="-")), as.date(paste((as.numeric(format(birth, "%y"))+j+5),format(birth,"%m-%d"), sep="-"))) start.abc <- safe.ifelse((start.ab <= end.c & start.c <= end.ab) & start.ab >= start.c, start.ab, safe.ifelse((start.ab <= end.c & start.c <= end.ab) & start.c >= start.ab, start.c, as.date("1000-01-01"))) end.abc <- safe.ifelse((start.ab <= end.c & start.c <= end.ab) & end.ab <= end.c, end.ab, safe.ifelse((start.ab <= end.c & start.c <= end.ab) & end.c <= end.ab, end.c, as.date("1000-01-01"))) result <- as.numeric(difftime(end.abc,start.abc,units="days")) data <- cbind(data,result) colnames(data) <- c(colnames(data)[1:(ncol(data)-1)], paste("fu",j,"in",i,sep="")) } } return(data) }
the function can used follows:
newdata <- spec.fu(years=2000:2001,birth="birth",begin="begin", end="end",age.cat=seq(30,35,5),data=dat)
which gives following result (new columns 4:7):
> newdata birth begin end fu30in2000 fu35in2000 fu30in2001 fu35in2001 1 1965-06-15 2000-01-01 2001-06-01 166 200 0 151 2 1960-02-01 2003-08-14 2006-10-24 0 0 0 0 3 1952-05-02 2007-12-05 2012-03-01 0 0 0 0
update (august 6 2013): fixed bug in function caused na's when date of birth on leap day.
Comments
Post a Comment