数据框中的列对应值
Column wise correspondence values in data frame
给定一个数据框,以年龄为基础,我需要从其他列(运动、模式)中获取对应值。有人可以帮忙 R/python 代码吗?
。
事实上,如果我能在 15 岁时获得 2 次棒球和 2 场比赛,那将会很有帮助; 19 岁 1 次高尔夫球和 1 次比赛。
输出应该如下所示,年龄作为基本变量
进一步以运动为基本变量,模式应该有类似的总结。谢谢
df = data.frame(Age = c(15,15,16,17,18,18,19,20),
Sport = c("Baseball","Baseball","Baseball","Baseball","Baseball","Golf","Golf","Golf"),
Mode = c("Play","Play","Play","Watch","Watch","Play","Play","Watch"),
stringsAsFactors = F)
library(dplyr)
library(tidyr)
df %>%
count(Age, Sport) %>%
spread(Sport, n, fill = 0)
# # A tibble: 6 x 3
# Age Baseball Golf
# * <dbl> <dbl> <dbl>
# 1 15 2 0
# 2 16 1 0
# 3 17 1 0
# 4 18 1 1
# 5 19 0 1
# 6 20 0 1
df %>%
count(Age, Mode) %>%
spread(Mode, n, fill = 0)
# # A tibble: 6 x 3
# Age Play Watch
# * <dbl> <dbl> <dbl>
# 1 15 2 0
# 2 16 1 0
# 3 17 0 1
# 4 18 1 1
# 5 19 1 0
# 6 20 0 1
如果你想产生一个单一的输出,你可以使用这个:
df = data.frame(Age = c(15,15,16,17,18,18,19,20),
Sport = c("Baseball","Baseball","Baseball","Baseball","Baseball","Golf","Golf","Golf"),
Mode = c("Play","Play","Play","Watch","Watch","Play","Play","Watch"),
stringsAsFactors = F)
library(dplyr)
library(tidyr)
library(purrr)
# function that reshapes data based on a column name
# (uses Age column as an identifier/key)
f = function(x) {
df %>%
group_by_("Age",x) %>%
summarise(n = n()) %>%
spread_(x, "n", fill = 0) %>%
ungroup()
}
names(df)[names(df) != "Age"] %>% # get all column names (different than Age)
map(f) %>% # apply function to each column name
reduce(left_join, by="Age") # join datasets sequentially
# # A tibble: 6 x 5
# Age Baseball Golf Play Watch
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 15 2 0 2 0
# 2 16 1 0 1 0
# 3 17 1 0 0 1
# 4 18 1 1 1 1
# 5 19 0 1 1 0
# 6 20 0 1 0 1
给定一个数据框,以年龄为基础,我需要从其他列(运动、模式)中获取对应值。有人可以帮忙 R/python 代码吗?
事实上,如果我能在 15 岁时获得 2 次棒球和 2 场比赛,那将会很有帮助; 19 岁 1 次高尔夫球和 1 次比赛。
输出应该如下所示,年龄作为基本变量
进一步以运动为基本变量,模式应该有类似的总结。谢谢
df = data.frame(Age = c(15,15,16,17,18,18,19,20),
Sport = c("Baseball","Baseball","Baseball","Baseball","Baseball","Golf","Golf","Golf"),
Mode = c("Play","Play","Play","Watch","Watch","Play","Play","Watch"),
stringsAsFactors = F)
library(dplyr)
library(tidyr)
df %>%
count(Age, Sport) %>%
spread(Sport, n, fill = 0)
# # A tibble: 6 x 3
# Age Baseball Golf
# * <dbl> <dbl> <dbl>
# 1 15 2 0
# 2 16 1 0
# 3 17 1 0
# 4 18 1 1
# 5 19 0 1
# 6 20 0 1
df %>%
count(Age, Mode) %>%
spread(Mode, n, fill = 0)
# # A tibble: 6 x 3
# Age Play Watch
# * <dbl> <dbl> <dbl>
# 1 15 2 0
# 2 16 1 0
# 3 17 0 1
# 4 18 1 1
# 5 19 1 0
# 6 20 0 1
如果你想产生一个单一的输出,你可以使用这个:
df = data.frame(Age = c(15,15,16,17,18,18,19,20),
Sport = c("Baseball","Baseball","Baseball","Baseball","Baseball","Golf","Golf","Golf"),
Mode = c("Play","Play","Play","Watch","Watch","Play","Play","Watch"),
stringsAsFactors = F)
library(dplyr)
library(tidyr)
library(purrr)
# function that reshapes data based on a column name
# (uses Age column as an identifier/key)
f = function(x) {
df %>%
group_by_("Age",x) %>%
summarise(n = n()) %>%
spread_(x, "n", fill = 0) %>%
ungroup()
}
names(df)[names(df) != "Age"] %>% # get all column names (different than Age)
map(f) %>% # apply function to each column name
reduce(left_join, by="Age") # join datasets sequentially
# # A tibble: 6 x 5
# Age Baseball Golf Play Watch
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 15 2 0 2 0
# 2 16 1 0 1 0
# 3 17 1 0 0 1
# 4 18 1 1 1 1
# 5 19 0 1 1 0
# 6 20 0 1 0 1