r - How to quantify overlap between three periods? -

i writing function calculate duration of overlap between 3 periods, having trouble in finding out how efficiently program this, can me out.

i have dataset of people have been followed on time. starting date, , time spent in study differs between participants. each participant, calculate how many days in study in specific year , in 5-year age category was. example, if in study 01-01-2000 01-06-2001, , born on 15-06-1965, contribute 166 days 30-34 year age category in 2000, 200 days in 35-39 year age category in 2000 , 151 days in 35-39 year age category in 2001, while spent 0 days in other categories.

in other words: quantify overlap between these periods:

a = entering study ending study (varies among participants, fixed value within participant)

b = begin specific year end specific year (same among participants, varies within participant)

c = entering specific 5-yr age category exiting specific 5-yr age category (varies among participants, varies within participant)

my data looks this:

dat <- data.frame(lapply(        data.frame("birth"=c("1965-06-15","1960-02-01","1952-05-02"),                   "begin"=c("2000-01-01","2003-08-14","2007-12-05"),                   "end"=c("2001-06-01","2006-10-24","2012-03-01")),as.date))

thus far, came this, not know how proceed (or whether should take totally different approach)…

spec.fu <- function(years,birth,begin,end,age.cat,data){    birth <- data[,birth]   start.a <- data[,begin]   end.a <- data[,end]    (i in years){     start.b <- as.date(paste(i,"01-01",sep="-"))      end.b <- as.date(paste(i+1,"01-01",sep="-"))       (j in age.cat){       start.c <- paste((as.numeric(format(birth, "%y"))+j),                          format(birth,"%m-%d"), sep="-")       end.c <- paste((as.numeric(format(birth, "%y"))+j+5),                        format(birth,"%m-%d"), sep="-")        result <- ?????        data[,ncol(data)+?????] <- result       colnames(data)[ncol(data)+?????] <- paste("fu",j,"in",i,sep="")       }   }    return(data) }

and use this:

 newdata <- spec.fu(years=2000:2001,birth="birth",begin="begin",                     end="end",age.cat=seq(30,35,5),data=dat)

so, in case, want make 2 (no. of age categories) * 2 (no. of years) = 4 new columns each participant, each containing no. of days has spent in study in specific category (e.g. in age category 30-34 in 2001).

hopefully able explain problem.

many in advance!

i found solution (see below). code looks rather cumbersome though, can made more efficient. advise welcome!

spec.fu <- function(years,birth,begin,end,age.cat,data){    birth <- data[,birth]   start.a <- data[,begin]   end.a <- data[,end]    if (any(sapply(c(birth,start.a,end.a),fun=function(x) class(x)!="date"))) {     stop("'birth', 'begin' , 'end' must of class 'date''") }    # ifelse-function saves date class in vectors        # (http://stackoverflow.com/questions/6668963)   safe.ifelse <- function(cond, yes, no) {                           structure(ifelse(cond, yes, no), class = class(yes))}    (i in years){     start.b <- rep(as.date(paste(i,"01-01",sep="-")),nrow(data))     end.b <- rep(as.date(paste(i+1,"01-01",sep="-")),nrow(data))      start.ab <- safe.ifelse((start.a <= end.b & start.b <= end.a) &                                start.a >= start.b, start.a,                  safe.ifelse((start.a <= end.b & start.b <= end.a) &                                 start.b >= start.a, start.b,                                     as.date("1000-01-01")))   #in latter case overlap zero, date required later on      end.ab <- safe.ifelse((start.a <= end.b & start.b <= end.a) &                              end.a <= end.b, end.a,                safe.ifelse((start.a <= end.b & start.b <= end.a) &                              end.b <= end.a, end.b,                                   as.date("1000-01-01")))       (j in age.cat){       start.c <- safe.ifelse(format(birth,"%m")=="02" & format(birth,                              "%d")=="29",                               as.date(paste((as.numeric(format(birth,                                       "%y"))+j),format(birth,"%m"),                                      "28", sep="-")),                              as.date(paste((as.numeric(format(birth,                                       "%y"))+j), format(birth,"%m-%d"),                                       sep="-")))       end.c <- safe.ifelse(format(birth,"%m")=="02" & format(birth,                            "%d")=="29",                            as.date(paste((as.numeric(format(birth,                                     "%y"))+j+5),format(birth,"%m"),                                    "28", sep="-")),                            as.date(paste((as.numeric(format(birth,                                     "%y"))+j+5),format(birth,"%m-%d"),                                     sep="-")))       start.abc <- safe.ifelse((start.ab <= end.c & start.c <= end.ab) &                                  start.ab >= start.c, start.ab,                    safe.ifelse((start.ab <= end.c & start.c <= end.ab) &                                  start.c >= start.ab, start.c,                                        as.date("1000-01-01")))        end.abc <- safe.ifelse((start.ab <= end.c & start.c <= end.ab) &                                end.ab <= end.c, end.ab,                   safe.ifelse((start.ab <= end.c & start.c <= end.ab) &                                end.c <= end.ab, end.c,                                        as.date("1000-01-01")))        result <- as.numeric(difftime(end.abc,start.abc,units="days"))        data <- cbind(data,result)       colnames(data) <- c(colnames(data)[1:(ncol(data)-1)],                       paste("fu",j,"in",i,sep=""))       }     }    return(data) }

the function can used follows:

newdata <- spec.fu(years=2000:2001,birth="birth",begin="begin",                    end="end",age.cat=seq(30,35,5),data=dat)

which gives following result (new columns 4:7):

> newdata        birth      begin        end fu30in2000 fu35in2000 fu30in2001 fu35in2001 1 1965-06-15 2000-01-01 2001-06-01        166        200          0        151 2 1960-02-01 2003-08-14 2006-10-24          0          0          0          0 3 1952-05-02 2007-12-05 2012-03-01          0          0          0          0

update (august 6 2013): fixed bug in function caused na's when date of birth on leap day.

Search This Blog

Search

r - How to quantify overlap between three periods? -

Comments

Post a Comment

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -