在 mutate dplyr 中使用逻辑运算符
Using logic operator in mutate dplyr
我有一个如下所示的数据框:
df = data.frame(animals = c("cat; dog; bird", "dog; bird", "bird"), sentences = c("the cat is brown; the dog is barking; the bird is green and blue", "the bird is yellow and blue", "the bird is blue"),year= c("2010","2012","2001"), stringsAsFactors = F)
df$year <- as.numeric(df$year)
> df
animals sentences year
1 cat; dog bird the cat is brown; the bird is green and blue 2010
2 dog; bird the dog is black; the bird is yellow and blue 2012
3 bird the bird is blue 2001
我想得到前 5 年(包括同年)列句子中动物的总和。
编辑
例如:第 2 行中的 animals dog 和 bird,在过去 5 年中在 sentences 列中重复了 3 次(包括同年)= 2012 年:dog 是黑色的; bird 是黄色和蓝色,2010 年:bird 是绿色和蓝色,总和 = 3。
期望的结果
# A tibble: 3 x 4
animals sentences year SUM
<chr> <chr> <dbl> <int>
1 cat; dog; bird the cat is brown; the bird is green and blue 2010 2
2 dog; bird the dog is black; the bird is yellow and blue 2012 3
3 bird the bird is blue 2001 1
解决方案
我使用了 中的以下代码并添加了一个逻辑运算符:
animals[(year>=year-5) & (year<=year)]
,但它没有给我我想要的输出。我究竟做错了什么?
string <- unlist(str_split(df$sentences, ";"))
df %>% rowwise %>%
mutate(SUM = str_split(animals[(year>=year-5) & (year<=year)], "; ", simplify = T) %>%
map( ~ str_count(string, .)) %>%
unlist %>% sum)
任何帮助将不胜感激:)。
尝试:
library(dplyr)
df %>%
mutate(SUM = sapply(strsplit(animals, "; "), length),
SUM = sapply(year, function(x) sum(SUM[between(year, x - 5 + 1, x)])))
这是输出:
animals sentences year SUM
1 cat; dog; bird the cat is brown; the dog is barking; the bird is green and blue 2010 3
2 dog; bird the dog is black; the bird is yellow and blue 2018 2
3 bird the bird is blue 2001 1
当然在 2010
中它不符合您想要的输出,因为您之前没有提供数据。
我有一个如下所示的数据框:
df = data.frame(animals = c("cat; dog; bird", "dog; bird", "bird"), sentences = c("the cat is brown; the dog is barking; the bird is green and blue", "the bird is yellow and blue", "the bird is blue"),year= c("2010","2012","2001"), stringsAsFactors = F)
df$year <- as.numeric(df$year)
> df
animals sentences year
1 cat; dog bird the cat is brown; the bird is green and blue 2010
2 dog; bird the dog is black; the bird is yellow and blue 2012
3 bird the bird is blue 2001
我想得到前 5 年(包括同年)列句子中动物的总和。
编辑
例如:第 2 行中的 animals dog 和 bird,在过去 5 年中在 sentences 列中重复了 3 次(包括同年)= 2012 年:dog 是黑色的; bird 是黄色和蓝色,2010 年:bird 是绿色和蓝色,总和 = 3。
期望的结果
# A tibble: 3 x 4
animals sentences year SUM
<chr> <chr> <dbl> <int>
1 cat; dog; bird the cat is brown; the bird is green and blue 2010 2
2 dog; bird the dog is black; the bird is yellow and blue 2012 3
3 bird the bird is blue 2001 1
解决方案
我使用了 animals[(year>=year-5) & (year<=year)]
,但它没有给我我想要的输出。我究竟做错了什么?
string <- unlist(str_split(df$sentences, ";"))
df %>% rowwise %>%
mutate(SUM = str_split(animals[(year>=year-5) & (year<=year)], "; ", simplify = T) %>%
map( ~ str_count(string, .)) %>%
unlist %>% sum)
任何帮助将不胜感激:)。
尝试:
library(dplyr)
df %>%
mutate(SUM = sapply(strsplit(animals, "; "), length),
SUM = sapply(year, function(x) sum(SUM[between(year, x - 5 + 1, x)])))
这是输出:
animals sentences year SUM
1 cat; dog; bird the cat is brown; the dog is barking; the bird is green and blue 2010 3
2 dog; bird the dog is black; the bird is yellow and blue 2018 2
3 bird the bird is blue 2001 1
当然在 2010
中它不符合您想要的输出,因为您之前没有提供数据。