删除冲销交易
Remove reversal transaction
我有一些逆转交易的交易级数据。这些交易用负数表示,然后用对应的正数表示。
trnx_df <- data.frame(Date = c("2018-01-01", "2018-01-01", "2018-01-01", "2018-01-01", "2018-01-03", "2018-01-03", "2018-01-05", "2018-02-01",
"2018-02-01", "2018-02-01"),
Product = c("A", "A", "A", "A", "B", "B", "B", "A", "A", "A"),
Amount = c(-1000, 1000, 1000, 1000, -1000, 1000, 500, -2000, 1000, 2000))
trnx_df
Date Product Amount
1 2018-01-01 A -1000
2 2018-01-01 A 1000
3 2018-01-01 A 1000
4 2018-01-01 A 1000
5 2018-01-03 B -1000
6 2018-01-03 B 1000
7 2018-01-05 B 500
8 2018-02-01 A -2000
9 2018-02-01 A 1000
10 2018-02-01 A 2000
我想知道该客户在特定产品上花费的总金额和最高金额。
通过使用 dplyr 我得到:
library(dplyr)
trnx_summary <- trnx_df %>%
group_by(Product) %>%
summarize(Total_amount = sum(Amount),
Max_amount = max(Amount))
trnx_summary
Product Total_amount Max_amount
1 A 3000 2000
2 B 500 1000
总计不会有问题,因为负项会抵消正项,但对于最大花费,我会得到错误的输出。
产品 A 的最大金额应为 1000(2000
和 -2000
将相互抵消)。
我该如何解决这个问题?另外,有没有办法从 dataframe 本身删除这些逆转交易?
df %>% #filter the negative transactions, save in dftemp
filter(Amount < 0) %>%
mutate(Amount = abs(Amount)) -> dftemp # in dftemp, negative transactions are positive to ease looking for matches
df %>% #filter the positive transactions that do no have a negative duplicate
filter(Amount > 0) %>%
anti_join(dftemp) -> dfuniques
df %>%
filter(Amount > 0) %>% #filter positive transactions
inner_join(dftemp) %>% #merge obs that are both in the original df and in dftemp
group_by(Date, Product, Amount) %>% #group by date, product and amount
slice(-1) %>% #for each date, product & amount combo, delete 1 row (which is a duplicate of one negative and one positive transaction)
full_join(dfuniques) %>% # join the unique positive transactions (from here on, you have your desired dataframe with negative and positive transactions that cancelled each other out deleted)
group_by(Product) %>%
summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))
Product Total_Amount Max_Amount
<fctr> <dbl> <dbl>
1 A 3000 1000
2 B 500 500
使用领先和滞后函数:
trnx_df %>%
group_by(Product, AmountAbs = abs(Amount)) %>%
arrange(Product, AmountAbs, Amount) %>%
mutate(
remove =
(sign(lag(Amount, default = 0)) == -1 &
lag(AmountAbs, default = 0) == Amount) |
((sign(Amount)) == -1 &
lead(AmountAbs) == AmountAbs)) %>%
ungroup() %>%
filter(!remove) %>%
group_by(Product) %>%
summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))
# # A tibble: 2 x 3
# Product Total_Amount Max_Amount
# <fct> <dbl> <dbl>
# 1 A 3000 1000
# 2 B 500 500
我有一些逆转交易的交易级数据。这些交易用负数表示,然后用对应的正数表示。
trnx_df <- data.frame(Date = c("2018-01-01", "2018-01-01", "2018-01-01", "2018-01-01", "2018-01-03", "2018-01-03", "2018-01-05", "2018-02-01",
"2018-02-01", "2018-02-01"),
Product = c("A", "A", "A", "A", "B", "B", "B", "A", "A", "A"),
Amount = c(-1000, 1000, 1000, 1000, -1000, 1000, 500, -2000, 1000, 2000))
trnx_df
Date Product Amount
1 2018-01-01 A -1000
2 2018-01-01 A 1000
3 2018-01-01 A 1000
4 2018-01-01 A 1000
5 2018-01-03 B -1000
6 2018-01-03 B 1000
7 2018-01-05 B 500
8 2018-02-01 A -2000
9 2018-02-01 A 1000
10 2018-02-01 A 2000
我想知道该客户在特定产品上花费的总金额和最高金额。
通过使用 dplyr 我得到:
library(dplyr)
trnx_summary <- trnx_df %>%
group_by(Product) %>%
summarize(Total_amount = sum(Amount),
Max_amount = max(Amount))
trnx_summary
Product Total_amount Max_amount
1 A 3000 2000
2 B 500 1000
总计不会有问题,因为负项会抵消正项,但对于最大花费,我会得到错误的输出。
产品 A 的最大金额应为 1000(2000
和 -2000
将相互抵消)。
我该如何解决这个问题?另外,有没有办法从 dataframe 本身删除这些逆转交易?
df %>% #filter the negative transactions, save in dftemp
filter(Amount < 0) %>%
mutate(Amount = abs(Amount)) -> dftemp # in dftemp, negative transactions are positive to ease looking for matches
df %>% #filter the positive transactions that do no have a negative duplicate
filter(Amount > 0) %>%
anti_join(dftemp) -> dfuniques
df %>%
filter(Amount > 0) %>% #filter positive transactions
inner_join(dftemp) %>% #merge obs that are both in the original df and in dftemp
group_by(Date, Product, Amount) %>% #group by date, product and amount
slice(-1) %>% #for each date, product & amount combo, delete 1 row (which is a duplicate of one negative and one positive transaction)
full_join(dfuniques) %>% # join the unique positive transactions (from here on, you have your desired dataframe with negative and positive transactions that cancelled each other out deleted)
group_by(Product) %>%
summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))
Product Total_Amount Max_Amount
<fctr> <dbl> <dbl>
1 A 3000 1000
2 B 500 500
使用领先和滞后函数:
trnx_df %>%
group_by(Product, AmountAbs = abs(Amount)) %>%
arrange(Product, AmountAbs, Amount) %>%
mutate(
remove =
(sign(lag(Amount, default = 0)) == -1 &
lag(AmountAbs, default = 0) == Amount) |
((sign(Amount)) == -1 &
lead(AmountAbs) == AmountAbs)) %>%
ungroup() %>%
filter(!remove) %>%
group_by(Product) %>%
summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))
# # A tibble: 2 x 3
# Product Total_Amount Max_Amount
# <fct> <dbl> <dbl>
# 1 A 3000 1000
# 2 B 500 500