r - Cumulative sum based on certain conditions

Wednesday, March 21, 2018

r - Cumulative sum based on certain conditions

Here is a dplyr-based answer using the same logic as @Floo0. This will tend to get slow as you have a larger number of groups.

First, I added the row numbers as a column to the original dataset. The calculation of CumSumB will be done for each unique row using this approach.

library(dplyr)

dat = dat %>% mutate(row = row_number())

Then I join the dataset to itself, joining X to Y and by Date. To avoid many duplicate columns with added suffixes, I selected only some of the columns for the x dataset of the join (i.e., first dataset of left_join).

I kept the variable row in both datasets on purpose, so I end up with a variable called row.x that indicates the original row number of each X value and a variable called row.y indicating the original row number of each Y value.

dat %>% 
    left_join(select(dat, X, Date, Y, row), ., by  = c("X" = "Y", "Date" = "Date"))

Once that is done, the dataset just needs to be grouped by row.x and the sum of Qty calculated conditional on row.x being less than or equal to row.y.

dat %>% 
    left_join(select(dat, X, Date, Y, row), ., by  = c("X" = "Y", "Date" = "Date")) %>%

    group_by(row.x) %>%
    summarise(CumSumB = sum(Qty[row.y <= row.x]))

Last, this can be joined back to the original dataset. The result still contains a column representing the row number, which could be removed via select(-row) if needed.

dat %>% 
    left_join(select(dat, X, Date, Y, row), ., by  = c("X" = "Y", "Date" = "Date")) %>%
    group_by(row.x) %>%
    summarise(CumSumB = sum(Qty[row.y <= row.x])) %>%

    left_join(dat, ., by = c("row" = "row.x"))

   X Y Date Qty CumSumA CumSumB.x row CumSumB.y
1  A B  1/1   1       1         0   1         0
2  A A  1/1   2       3         2   2         2
3  A E  1/1   2       5         2   3         2
4  B A  1/1   1       1         1   4         1
5  B B  1/1   3       4         4   5         4
6  B C  1/1   2       6         4   6         4
7  C D  1/1   2       2         2   7         2

8  C E  1/1   4       6         2   8         2
9  C A  1/1   1       7         2   9         2
10 A C  1/2   2       2         0  10         0
11 A D  1/2   3       5         0  11         0
12 A E  1/2   2       7         0  12         0
13 B A  1/2   5       5         0  13         0
14 B B  1/2   1       6         1  14         1
15 B C  1/2   2       8         1  15         1
16 C D  1/2   2       2         4  16         4
17 C E  1/2   1       1         4  17         4

18 C A  1/2   3       4         4  18         4

Blog

Wednesday, March 21, 2018