在 R 中如何获取包含列表中值的行并创建计数数据框
in R how to get rows that contain values in a list and create a dataframe of counts
我有一个包含以下内容的数据框:
Meal Contents
Type_1 redberries,strawberry,blackberry
Type_2 banana,apple,strawberry,
Type_3 rice,chicken
Type_4 beef,stringbeans,mashpotatoes
Type_5 banana,strawberry,berry,cantaloupe
我创建了内容列的矢量表示,新的 df2 是
Meal Contents Strawberry Banana Rice
Type_1 redberries,strawberry,blackberry 1 0 0
Type_2 banana,apple,strawberry, 1 1
Type_3 rice,chicken 0 0
Type_4 beef,stringbeans,mashpotatoes 0 0
Type_5 banana,strawberry,berry,cantaloupe 1 1
我试图根据 :
的计数获取前 2 个内容
top2_v1 <- c("strawberry","banana")
但是我很难尝试取回包含前 N 内容的膳食类型计数的频率分布???
我可以 运行 使用 df2 数据帧中的 top2_v1 循环,这样我就可以创建另一个数据帧,让我知道每个前 N 个内容的频率吗?
试试这个(从 df2 开始):
df2
Meal Contents apple banana beef berry blackberry cantaloupe chicken mashpotatoes redberries rice strawberry stringbeans
1 Type_1 redberries,strawberry,blackberry 0 0 0 0 1 0 0 0 1 0 1 0
2 Type_2 banana,apple,strawberry, 1 1 0 0 0 0 0 0 0 0 1 0
3 Type_3 rice,chicken 0 0 0 0 0 0 1 0 0 1 0 0
4 Type_4 beef,stringbeans,mashpotatoes 0 0 1 0 0 0 0 1 0 0 0 1
5 Type_5 banana,strawberry,berry,cantaloupe 0 1 0 1 0 1 0 0 0 0 1 0
n <- 2
topn_v1 <- names(sort(colSums(df2[3:ncol(df2)]), decreasing=TRUE))[1:n]
indices <- apply(df2, 1, function(x) any(as.integer(as.character(x[topn_v1]))))
df2[indices,] # Meals that contain at least one of the top_n Contents
Meal Contents apple banana beef berry blackberry cantaloupe chicken mashpotatoes redberries rice strawberry stringbeans
1 Type_1 redberries,strawberry,blackberry 0 0 0 0 1 0 0 0 1 0 1 0
2 Type_2 banana,apple,strawberry, 1 1 0 0 0 0 0 0 0 0 1 0
5 Type_5 banana,strawberry,berry,cantaloupe 0 1 0 1 0 1 0 0 0 0 1 0
table(df2[indices,]$Meal)
Type_1 Type_2 Type_3 Type_4 Type_5
1 1 0 0 1
table(df2[indices,]$Meal) / nrow(df[indices,]) # in proportion
Type_1 Type_2 Type_3 Type_4 Type_5
0.3333333 0.3333333 0.0000000 0.0000000 0.3333333
试试这个:
n <- 2
topn_v1 <- names(sort(colSums(df2[3:ncol(df2)]), decreasing=TRUE))[1:n]
indices <- apply(df2, 1, function(x) any(as.integer(as.character(x[topn_v1]))))
table(df2[indices,]$Meal)
table(df2[indices,]$Meal) / nrow(df[indices,])
barplot(sort(table(df2[indices,]$Meal) / nrow(df[indices,]), decreasing = TRUE),
ylab='Proportions')
我有一个包含以下内容的数据框:
Meal Contents
Type_1 redberries,strawberry,blackberry
Type_2 banana,apple,strawberry,
Type_3 rice,chicken
Type_4 beef,stringbeans,mashpotatoes
Type_5 banana,strawberry,berry,cantaloupe
我创建了内容列的矢量表示,新的 df2 是
Meal Contents Strawberry Banana Rice
Type_1 redberries,strawberry,blackberry 1 0 0
Type_2 banana,apple,strawberry, 1 1
Type_3 rice,chicken 0 0
Type_4 beef,stringbeans,mashpotatoes 0 0
Type_5 banana,strawberry,berry,cantaloupe 1 1
我试图根据 :
的计数获取前 2 个内容 top2_v1 <- c("strawberry","banana")
但是我很难尝试取回包含前 N 内容的膳食类型计数的频率分布???
我可以 运行 使用 df2 数据帧中的 top2_v1 循环,这样我就可以创建另一个数据帧,让我知道每个前 N 个内容的频率吗?
试试这个(从 df2 开始):
df2
Meal Contents apple banana beef berry blackberry cantaloupe chicken mashpotatoes redberries rice strawberry stringbeans
1 Type_1 redberries,strawberry,blackberry 0 0 0 0 1 0 0 0 1 0 1 0
2 Type_2 banana,apple,strawberry, 1 1 0 0 0 0 0 0 0 0 1 0
3 Type_3 rice,chicken 0 0 0 0 0 0 1 0 0 1 0 0
4 Type_4 beef,stringbeans,mashpotatoes 0 0 1 0 0 0 0 1 0 0 0 1
5 Type_5 banana,strawberry,berry,cantaloupe 0 1 0 1 0 1 0 0 0 0 1 0
n <- 2
topn_v1 <- names(sort(colSums(df2[3:ncol(df2)]), decreasing=TRUE))[1:n]
indices <- apply(df2, 1, function(x) any(as.integer(as.character(x[topn_v1]))))
df2[indices,] # Meals that contain at least one of the top_n Contents
Meal Contents apple banana beef berry blackberry cantaloupe chicken mashpotatoes redberries rice strawberry stringbeans
1 Type_1 redberries,strawberry,blackberry 0 0 0 0 1 0 0 0 1 0 1 0
2 Type_2 banana,apple,strawberry, 1 1 0 0 0 0 0 0 0 0 1 0
5 Type_5 banana,strawberry,berry,cantaloupe 0 1 0 1 0 1 0 0 0 0 1 0
table(df2[indices,]$Meal)
Type_1 Type_2 Type_3 Type_4 Type_5
1 1 0 0 1
table(df2[indices,]$Meal) / nrow(df[indices,]) # in proportion
Type_1 Type_2 Type_3 Type_4 Type_5
0.3333333 0.3333333 0.0000000 0.0000000 0.3333333
试试这个:
n <- 2
topn_v1 <- names(sort(colSums(df2[3:ncol(df2)]), decreasing=TRUE))[1:n]
indices <- apply(df2, 1, function(x) any(as.integer(as.character(x[topn_v1]))))
table(df2[indices,]$Meal)
table(df2[indices,]$Meal) / nrow(df[indices,])
barplot(sort(table(df2[indices,]$Meal) / nrow(df[indices,]), decreasing = TRUE),
ylab='Proportions')