Friday, May 31, 2019

r - How to create pair-wise data.frame



I have a data.frame with sample data from multiple sites and years. I would like to create a new "pairwise" data.frame that takes creates three columns, Site_Year, Value_2008, Value_Year, where Value_Year is the value for any other year in the dataset. For example, in my sample dataset the site "Anjan" is visited in 2008, 2009, 2010. The new data.frame to have two records for this site:




Site_Year Value_2008 Value_Year
Anjan.2009 0.11 0.96
Anjan.2010 0.11 -0.25


If there is a site that was sampled before 2008 the results would look like:



Site_Year Value_2008 Value_Year
Ljungdalen.2005 0.09 -0.40

Ljungdalen.2009 0.09 0.32
Ljungdalen.2011 0.09 -0.00


Ultimately, I am going to plot these results so that I can look at changes in these (mean) values between years and test them to see if they are significantly different.



Sample data:



WW_Site_Year_Data <- structure(list(Site_Year = c("Anjan.2008", "Anjan.2009", "Anjan.2010",
"Fittjebodarna.2008", "Fittjebodarna.2009", "Flatruet.2008",

"Flatruet.2009", "Flatruet.2010", "Flatruet.2011", "Fotingen.2008",
"Fotingen.2009", "Gestvallen.2008", "Gestvallen.2009", "Glen.2008",
"Glen.2009", "Glen.2010", "Gräftåvallen.2008", "Gräftåvallen.2009",
"Hallbodarna.2008", "Hallbodarna.2009", "Ljungdalen.2005", "Ljungdalen.2008",
"Ljungdalen.2009", "Ljungdalen.2011", "Ljungris.2008", "Ljungris.2009",
"Ljungris.2010", "Mårdsund.2008", "Mårdsund.2009", "Mårdsund.2010",
"Öster_Galåbodarna.2008", "Öster_Galåbodarna.2009", "Ramundberget.2008",
"Ramundberget.2009", "Rätan.2008", "Rätan.2009", "Rätan.2010",
"Särvfjället.2008", "Särvfjället.2009", "Storulvån.2008", "Storulvån.2009",
"Storulvån.2010", "Tångböle.2005", "Tångböle.2008", "Tångböle.2009",

"Tossåsen.2008", "Tossåsen.2009", "Vålådalen.2008", "Vålådalen.2009",
"Vålådalen.2010", "Vemdalsskalet.2002", "Vemdalsskalet.2008",
"Vemdalsskalet.2009"), Site_Name = c("Anjan", "Anjan", "Anjan",
"Fittjebodarna", "Fittjebodarna", "Flatruet", "Flatruet", "Flatruet",
"Flatruet", "Fotingen", "Fotingen", "Gestvallen", "Gestvallen",
"Glen", "Glen", "Glen", "Gräftåvallen", "Gräftåvallen", "Hallbodarna",
"Hallbodarna", "Ljungdalen", "Ljungdalen", "Ljungdalen", "Ljungdalen",
"Ljungris", "Ljungris", "Ljungris", "Mårdsund", "Mårdsund", "Mårdsund",
"Öster_Galåbodarna", "Öster_Galåbodarna", "Ramundberget", "Ramundberget",
"Rätan", "Rätan", "Rätan", "Särvfjället", "Särvfjället", "Storulvån",

"Storulvån", "Storulvån", "Tångböle", "Tångböle", "Tångböle",
"Tossåsen", "Tossåsen", "Vålådalen", "Vålådalen", "Vålådalen",
"Vemdalsskalet", "Vemdalsskalet", "Vemdalsskalet"), Year = c("2008",
"2009", "2010", "2008", "2009", "2008", "2009", "2010", "2011",
"2008", "2009", "2008", "2009", "2008", "2009", "2010", "2008",
"2009", "2008", "2009", "2005", "2008", "2009", "2011", "2008",
"2009", "2010", "2008", "2009", "2010", "2008", "2009", "2008",
"2009", "2008", "2009", "2010", "2008", "2009", "2008", "2009",
"2010", "2005", "2008", "2009", "2008", "2009", "2008", "2009",
"2010", "2002", "2008", "2009"), Value = c(0.112816109860291,

0.960290707474735, -0.257326331130005, 0.216427465038733, 0.956767099330118,
0.0526510364211729, 0.588996350906268, -0.465285757216318, -0.0437472490447301,
0.535560060909972, 0.389645985829418, 0.89391173396597, 0.240894790643034,
0.469695915206932, 0.251017199176266, -0.208323019946377, 0.0849050148677196,
0.184296870311739, -0.107309010266098, 0.36349491505071, -0.396608387512831,
0.0931374765423872, 0.32514153209616, -0.00466699456138238, 0.336789804880864,
0.806764888899387, -0.13955949253251, 0.108716189818012, 0.0978498660969545,
-0.160915801270199, 0.623151244760041, 0.713080174849265, -0.279733235253308,
0.427277452192635, 0.296839613563375, 0.659552054627706, 0.356154916318252,
0.659219370597927, 0.825734934055685, 0.183577169158829, 1.31979282562961,

0.0574714990570691, -0.517636804796056, 0.478950613513035, 0.504156229919797,
0.915335741962761, 1.11129338570452, 0.205145037964442, 0.829430613619136,
-0.187573312149385, -0.646545555669656, -0.034670665219269, 1.68156971430668
)), .Names = c("Site_Year", "Site_Name", "Year", "Value"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L), class = "data.frame")

Answer




A very useful package in this situation is plyr:



ddply(WW_Site_Year_Data, .(Site_Name), transform,
Value_2008 = if (2008 %in% Year) Value[Year==2008] else NA)


If records with no Value for 2008 are excluded from the data frame you could use



ddply(WW_Site_Year_Data, .(Site_Name), transform,
Value_2008 = Value[Year==2008])



It you want to use base R, it's uglier:



unsplit(lapply(split(WW_Site_Year_Data, WW_Site_Year_Data$Site_Name),
transform, Value_2008 = if (2008 %in% Year) Value[Year==2008] else NA),
WW_Site_Year_Data$Site_Name)

No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...