Monday, February 18, 2019

A tricky loop in R?



I've been struggling for a few days to solve the this task in R (I'm a former SAS user).



The setting/study
- Observational data. Patients with Crohns Disease. Data was collected annually during 2002–2013.
- Patients can be included any year and visits may be irregular on a annual basis.
- I know the exact day of death for each patient. VARIABLE: DEATH_YEAR
- I know the exact day of relapse (the endpoint of interest). VARIABLE: RELAPSE_YEAR




I am interested in the incidence of relapse and I need to calculate the number of relapses each year divided by the number of individuals alive that year. Now the problem is that from inclusion, individuals come irregularly, but I do know if they are actually alive that year and if they have experienced a relapse.



I could solve this if I could create 12 new variables for each patient. Each new variable should be the calendar year and this variable should be set to '1' if the patient is alive that year and has not yet experienced the event.



Thus the problem is that i need to create a 'year-variables' that are set to '1' for each year at inclusion and thereafter, given that the person is not dead, or has experienced the event.



An example:
Patient X was included 2005 and died 2009. For him I would need he following variables: '2005', '2006', '2007', '2008' and '2009' set to '1'.
Patient Y was included 2005 and experienced event 2007. For him I would need the following variables: '2005', '2006', 2007' set to '1'. (Yes, year of event/death need still be set to '1').




Here is how my data set looks:



data <- read.table(header = TRUE, text = "
patient visit first_visit relapse_year death_year
1 2003 2003 . 2010
1 2004 2003 . 2010
1 2009 2003 . 2010
2 2002 2002 2006 .
2 2006 2002 2006 .

2 2006 2002 2006 .
2 2008 2002 2006 .
2 2012 2002 2006 .
3 2004 2004 . .
3 2008 2004 . .
3 2008 2004 . .
")


Here is the DESIRED data set




desired_data <- read.table(header = TRUE, text = "
patient visit first_visit relapse_year death_year YEAR2002 YEAR2003 YEAR2004 YEAR2005 YEAR2006 YEAR2007 YEAR2008 YEAR2009 YEAR2010 YEAR2011 YEAR2012
1 2003 2003 . 2010 . 1 1 1 1 1 1 1 1 . .
1 2004 2003 . 2010 . 1 1 1 1 1 1 1 1 . .
1 2009 2003 . 2010 . 1 1 1 1 1 1 1 1 . .
2 2002 2002 2006 . 1 1 1 1 1 . . . . . .
2 2006 2002 2006 . 1 1 1 1 1 . . . . . .
2 2006 2002 2006 . 1 1 1 1 1 . . . . . .
2 2008 2002 2006 . 1 1 1 1 1 . . . . . .

2 2012 2002 2006 . 1 1 1 1 1 . . . . . .
3 2004 2004 . . . . 1 1 1 1 1 1 1 1 1
3 2008 2004 . . . . 1 1 1 1 1 1 1 1 1
3 2008 2004 . . . . 1 1 1 1 1 1 1 1 1
")


I would be extremely grateful for any advice on this!
Thanks in advance!


Answer




It's a bit hackish, but this will work. First turn your data into a numeric data frame so that the . turn into NA:



data0<-data.frame(lapply(data,function(x) as.numeric(as.character(x))))
head(data0)
# patient visit first_visit relapse_year death_year
# 1 1 2003 2003 NA 2010
# 2 1 2004 2003 NA 2010
# 3 1 2009 2003 NA 2010
# 4 2 2002 2002 2006 NA
# 5 2 2006 2002 2006 NA

# 6 2 2006 2002 2006 NA


Then substitute 2012 (or whatever the last year is) for the NA values.



data0[is.na(data0)]<-2012


Now you can use pmin to determine how long until the patient dies/has a relapse/the experiment ends. The last thing to do is use arithmetic on column numbers to create the new dataset:




activeYears<-matrix(0,nrow(data0),11)
colnames(activeYears)<-2002:2012
startYear<-data0$first_visit[row(activeYears)]
endYear<-pmin(data0$relapse_year[row(activeYears)],data0$death_year[row(activeYears)])
colYear<-col(activeYears)+2001
activeYears[]<-startYear<=colYear & endYear>=colYear
activeYears
# 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
# [1,] 0 1 1 1 1 1 1 1 1 0 0
# [2,] 0 1 1 1 1 1 1 1 1 0 0

# [3,] 0 1 1 1 1 1 1 1 1 0 0
# [4,] 1 1 1 1 1 0 0 0 0 0 0
# [5,] 1 1 1 1 1 0 0 0 0 0 0
# [6,] 1 1 1 1 1 0 0 0 0 0 0
# [7,] 1 1 1 1 1 0 0 0 0 0 0
# [8,] 1 1 1 1 1 0 0 0 0 0 0
# [9,] 0 0 1 1 1 1 1 1 1 1 1
#[10,] 0 0 1 1 1 1 1 1 1 1 1
#[11,] 0 0 1 1 1 1 1 1 1 1 1


No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...