Here is a dplyr-based answer using the same logic as @Floo0. This will tend to get slow as you have a larger number of groups.
First, I added the row numbers as a column to the original dataset. The calculation of CumSumB
will be done for each unique row using this approach.
library(dplyr)
dat = dat %>% mutate(row = row_number())
Then I join the dataset to itself, joining X
to Y
and by Date
. To avoid many duplicate columns with added suffixes, I selected only some of the columns for the x
dataset of the join (i.e., first dataset of left_join
).
I kept the variable row
in both datasets on purpose, so I end up with a variable called row.x
that indicates the original row number of each X
value and a variable called row.y
indicating the original row number of each Y
value.
dat %>%
left_join(select(dat, X, Date, Y, row), ., by = c("X" = "Y", "Date" = "Date"))
Once that is done, the dataset just needs to be grouped by row.x
and the sum of Qty
calculated conditional on row.x
being less than or equal to row.y
.
dat %>%
left_join(select(dat, X, Date, Y, row), ., by = c("X" = "Y", "Date" = "Date")) %>%
group_by(row.x) %>%
summarise(CumSumB = sum(Qty[row.y <= row.x]))
Last, this can be joined back to the original dataset. The result still contains a column representing the row number, which could be removed via select(-row)
if needed.
dat %>%
left_join(select(dat, X, Date, Y, row), ., by = c("X" = "Y", "Date" = "Date")) %>%
group_by(row.x) %>%
summarise(CumSumB = sum(Qty[row.y <= row.x])) %>%
left_join(dat, ., by = c("row" = "row.x"))
X Y Date Qty CumSumA CumSumB.x row CumSumB.y
1 A B 1/1 1 1 0 1 0
2 A A 1/1 2 3 2 2 2
3 A E 1/1 2 5 2 3 2
4 B A 1/1 1 1 1 4 1
5 B B 1/1 3 4 4 5 4
6 B C 1/1 2 6 4 6 4
7 C D 1/1 2 2 2 7 2
8 C E 1/1 4 6 2 8 2
9 C A 1/1 1 7 2 9 2
10 A C 1/2 2 2 0 10 0
11 A D 1/2 3 5 0 11 0
12 A E 1/2 2 7 0 12 0
13 B A 1/2 5 5 0 13 0
14 B B 1/2 1 6 1 14 1
15 B C 1/2 2 8 1 15 1
16 C D 1/2 2 2 4 16 4
17 C E 1/2 1 1 4 17 4
18 C A 1/2 3 4 4 18 4
No comments:
Post a Comment