Saturday, May 11, 2019

r - Error when consolidating like rows with plyr - what am I doing wrong?

I have a dataframe (dtetags.df) with a date column that has many duplicate dates:



dtetags.df$Date
"2016-07-22" "2016-07-22" "2016-07-21" "2016-07-21" "2016-07-20" "2016-07-20" "2016-07-19" "2016-07-19" "2016-07-18" "2016-07-18" "2016-07-15" "2016-07-15" "2016-07-15" "2016-07-14"

"2016-07-14" "2016-07-13" "2016-07-13" "2016-07-13" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-08" "2016-07-08"
"2016-07-08" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-06" "2016-07-06" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-01" "2016-07-01" "2016-06-30"
"2016-06-30" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-28" "2016-06-28" "2016-06-28" "2016-06-27" "2016-06-27" "2016-06-27" "2016-06-24" "2016-06-24"
"2016-06-23" "2016-06-23" "2016-06-22" "2016-06-22" "2016-06-21" "2016-06-21" "2016-06-20" "2016-06-20" "2016-06-17" "2016-06-17" "2016-06-16" "2016-06-16" "2016-06-15" "2016-06-15"
"2016-06-14" "2016-06-13" "2016-06-13" "2016-06-10" "2016-06-10" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-08" "2016-06-08" "2016-06-07" "2016-06-07" "2016-06-06"
"2016-06-06" "2016-06-06" "2016-06-01" "2016-06-01" "2016-05-29" "2016-05-29" "2016-05-27" "2016-05-27" "2016-05-26" "2016-05-26" "2016-05-25" "2016-05-25" "2016-05-24" "2016-05-23"
"2016-05-23" "2016-05-20"


and a number of binary tag columns that show whether a post was made with that tag on that date, for example:




dtetags.df$Technology
"0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "1" "1" "0" "1" "0" "1"
"0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
"0" "0" "0" "0" "0" "0" "0" "0" "0" "0"


and I am trying to use ddply(dtetags.df,"Date",numcolwise(sum)) based on this question but it returns this error message <0 rows> (or 0-length row.names). I have tried a number of different ways to format the ddply command, but I cannot get it to work.



The ideal output would look like:




               Date            Technology
1 2016-07-22 0
2 2016-07-21 0
3 2016-07-20 0
4 2016-07-19 0
5 2016-07-18 0
6 2016-07-15 0
7 2016-07-14 0
8 2016-07-13 0

9 2016-07-12 0
10 2016-07-11 0
11 2016-07-08 0
12 2016-07-07 0
13 2016-07-06 1
14 2016-07-05 0
15 2016-07-01 2
16 2016-06-30 1
17 2016-06-29 1
18 2016-06-28 0

19 2016-06-27 0
20 2016-06-24 1
21 2016-06-23 0
22 2016-06-22 0
23 2016-06-21 0
24 2016-06-20 0
25 2016-06-17 0
26 2016-06-16 0
27 2016-06-15 0
28 2016-06-14 1

29 2016-06-13 0
30 2016-06-10 0
31 2016-06-09 0
32 2016-06-08 0
33 2016-06-07 0
34 2016-06-06 0
35 2016-06-01 0
36 2016-05-29 0
37 2016-05-27 0
38 2016-05-26 0

39 2016-05-25 0
40 2016-05-24 0
41 2016-05-23 0
42 2016-05-20 0


Is there something obvious I am doing wrong?



Conversion from Factor to Numeric




I removed the Date column, applied data.frame(apply(dtetags.df, 2, function(x) as.numeric(as.character(x)))) to the rest of the data frame, and prepended the Date column back in.



dput(dtetags.df)
structure(list(Date = c("2016-07-22", "2016-07-22", "2016-07-21",
"2016-07-21", "2016-07-20", "2016-07-20", "2016-07-19", "2016-07-19",
"2016-07-18", "2016-07-18", "2016-07-15", "2016-07-15", "2016-07-15",
"2016-07-14", "2016-07-14", "2016-07-13", "2016-07-13", "2016-07-13",
"2016-07-12", "2016-07-12", "2016-07-12", "2016-07-12", "2016-07-11",
"2016-07-11", "2016-07-11", "2016-07-11", "2016-07-08", "2016-07-08",
"2016-07-08", "2016-07-07", "2016-07-07", "2016-07-07", "2016-07-07",

"2016-07-06", "2016-07-06", "2016-07-05", "2016-07-05", "2016-07-05",
"2016-07-05", "2016-07-01", "2016-07-01", "2016-06-30", "2016-06-30",
"2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29",
"2016-06-28", "2016-06-28", "2016-06-28", "2016-06-27", "2016-06-27",
"2016-06-27", "2016-06-24", "2016-06-24", "2016-06-23", "2016-06-23",
"2016-06-22", "2016-06-22", "2016-06-21", "2016-06-21", "2016-06-20",
"2016-06-20", "2016-06-17", "2016-06-17", "2016-06-16", "2016-06-16",
"2016-06-15", "2016-06-15", "2016-06-14", "2016-06-13", "2016-06-13",
"2016-06-10", "2016-06-10", "2016-06-09", "2016-06-09", "2016-06-09",
"2016-06-09", "2016-06-08", "2016-06-08", "2016-06-07", "2016-06-07",

"2016-06-06", "2016-06-06", "2016-06-06", "2016-06-01", "2016-06-01",
"2016-05-29", "2016-05-29", "2016-05-27", "2016-05-27", "2016-05-26",
"2016-05-26", "2016-05-25", "2016-05-25", "2016-05-24", "2016-05-23",
"2016-05-23", "2016-05-20"), `Technology` = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("Date",
"Technology"), class = c("tbl_df", "tbl", "data.frame"

), row.names = c(NA, -100L))

No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...