I have a data.frame
with sample data from multiple sites
and years
. I would like to create a new "pairwise" data.frame
that takes creates three columns, Site_Year
, Value_2008
, Value_Year
, where Value_Year
is the value for any other year in the dataset. For example, in my sample dataset the site "Anjan" is visited in 2008, 2009, 2010. The new data.frame
to have two records for this site
:
Site_Year Value_2008 Value_Year
Anjan.2009 0.11 0.96
Anjan.2010 0.11 -0.25
If there is a site
that was sampled before 2008 the results would look like:
Site_Year Value_2008 Value_Year
Ljungdalen.2005 0.09 -0.40
Ljungdalen.2009 0.09 0.32
Ljungdalen.2011 0.09 -0.00
Ultimately, I am going to plot these results so that I can look at changes in these (mean) values between years and test them to see if they are significantly different.
Sample data:
WW_Site_Year_Data <- structure(list(Site_Year = c("Anjan.2008", "Anjan.2009", "Anjan.2010",
"Fittjebodarna.2008", "Fittjebodarna.2009", "Flatruet.2008",
"Flatruet.2009", "Flatruet.2010", "Flatruet.2011", "Fotingen.2008",
"Fotingen.2009", "Gestvallen.2008", "Gestvallen.2009", "Glen.2008",
"Glen.2009", "Glen.2010", "Gräftåvallen.2008", "Gräftåvallen.2009",
"Hallbodarna.2008", "Hallbodarna.2009", "Ljungdalen.2005", "Ljungdalen.2008",
"Ljungdalen.2009", "Ljungdalen.2011", "Ljungris.2008", "Ljungris.2009",
"Ljungris.2010", "Mårdsund.2008", "Mårdsund.2009", "Mårdsund.2010",
"Öster_Galåbodarna.2008", "Öster_Galåbodarna.2009", "Ramundberget.2008",
"Ramundberget.2009", "Rätan.2008", "Rätan.2009", "Rätan.2010",
"Särvfjället.2008", "Särvfjället.2009", "Storulvån.2008", "Storulvån.2009",
"Storulvån.2010", "Tångböle.2005", "Tångböle.2008", "Tångböle.2009",
"Tossåsen.2008", "Tossåsen.2009", "Vålådalen.2008", "Vålådalen.2009",
"Vålådalen.2010", "Vemdalsskalet.2002", "Vemdalsskalet.2008",
"Vemdalsskalet.2009"), Site_Name = c("Anjan", "Anjan", "Anjan",
"Fittjebodarna", "Fittjebodarna", "Flatruet", "Flatruet", "Flatruet",
"Flatruet", "Fotingen", "Fotingen", "Gestvallen", "Gestvallen",
"Glen", "Glen", "Glen", "Gräftåvallen", "Gräftåvallen", "Hallbodarna",
"Hallbodarna", "Ljungdalen", "Ljungdalen", "Ljungdalen", "Ljungdalen",
"Ljungris", "Ljungris", "Ljungris", "Mårdsund", "Mårdsund", "Mårdsund",
"Öster_Galåbodarna", "Öster_Galåbodarna", "Ramundberget", "Ramundberget",
"Rätan", "Rätan", "Rätan", "Särvfjället", "Särvfjället", "Storulvån",
"Storulvån", "Storulvån", "Tångböle", "Tångböle", "Tångböle",
"Tossåsen", "Tossåsen", "Vålådalen", "Vålådalen", "Vålådalen",
"Vemdalsskalet", "Vemdalsskalet", "Vemdalsskalet"), Year = c("2008",
"2009", "2010", "2008", "2009", "2008", "2009", "2010", "2011",
"2008", "2009", "2008", "2009", "2008", "2009", "2010", "2008",
"2009", "2008", "2009", "2005", "2008", "2009", "2011", "2008",
"2009", "2010", "2008", "2009", "2010", "2008", "2009", "2008",
"2009", "2008", "2009", "2010", "2008", "2009", "2008", "2009",
"2010", "2005", "2008", "2009", "2008", "2009", "2008", "2009",
"2010", "2002", "2008", "2009"), Value = c(0.112816109860291,
0.960290707474735, -0.257326331130005, 0.216427465038733, 0.956767099330118,
0.0526510364211729, 0.588996350906268, -0.465285757216318, -0.0437472490447301,
0.535560060909972, 0.389645985829418, 0.89391173396597, 0.240894790643034,
0.469695915206932, 0.251017199176266, -0.208323019946377, 0.0849050148677196,
0.184296870311739, -0.107309010266098, 0.36349491505071, -0.396608387512831,
0.0931374765423872, 0.32514153209616, -0.00466699456138238, 0.336789804880864,
0.806764888899387, -0.13955949253251, 0.108716189818012, 0.0978498660969545,
-0.160915801270199, 0.623151244760041, 0.713080174849265, -0.279733235253308,
0.427277452192635, 0.296839613563375, 0.659552054627706, 0.356154916318252,
0.659219370597927, 0.825734934055685, 0.183577169158829, 1.31979282562961,
0.0574714990570691, -0.517636804796056, 0.478950613513035, 0.504156229919797,
0.915335741962761, 1.11129338570452, 0.205145037964442, 0.829430613619136,
-0.187573312149385, -0.646545555669656, -0.034670665219269, 1.68156971430668
)), .Names = c("Site_Year", "Site_Name", "Year", "Value"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L), class = "data.frame")
Answer
A very useful package in this situation is plyr
:
ddply(WW_Site_Year_Data, .(Site_Name), transform,
Value_2008 = if (2008 %in% Year) Value[Year==2008] else NA)
If records with no Value for 2008 are excluded from the data frame you could use
ddply(WW_Site_Year_Data, .(Site_Name), transform,
Value_2008 = Value[Year==2008])
It you want to use base R, it's uglier:
unsplit(lapply(split(WW_Site_Year_Data, WW_Site_Year_Data$Site_Name),
transform, Value_2008 = if (2008 %in% Year) Value[Year==2008] else NA),
WW_Site_Year_Data$Site_Name)
No comments:
Post a Comment