动态生成数据框值列名
Dynamically make data frame value column name
我正在尝试获取列中的值以设置为列名。冒号前的字符应该是列名。
df = cbind.data.frame(
id = c(1, 2 ,3, 4, 5),
characteristics_ch1 = c("gender: Female", "gender: Male", "gender: Female", "gender: Male", "gender: Female"),
characteristics_ch1.1 = c("Thing One: a", "Thing One: a", "Thing One: a", "Thing One: b", "Thing One: b"),
characteristics_ch1.2 = c("age: 60", "age: 45", "age: 63", "age: 56", "age: 65"))
对于第 2-5 列,我想删除 "gender: "、"Thing One: " 和 "age: ",使它们成为各自列的名称。
生成的数据框将是:
Result = cbind.data.frame(
id = c(1, 2 ,3, 4, 5),
gender = c("Female", "Male", "Female", "Male", "Female"),
`Thing One` = c("a", "a", "a", "b", "b"),
age = c("60", "45", "63", "56", "65")
)
为此,我 运行 以下函数:
re_col = function(i){
new_name = str_split_fixed(i, ": ", 2)[1]
return(assign(new_name, str_split_fixed(i, ": ", 2)[,2]))
}
通过以下应用函数:
plyr::colwise(re_col)(df)
#and
purrr::map(df, re_col)
没有成功。
可能还有更好的方法。我最初尝试编写一个可以与 dplyr 在数据清理中作为 %>% 步骤一起使用的函数,但没有成功。
一种变通方法,使用 stringi
通过提供给任何指定列的正则表达式模式拆分 data-values
rename.df_cols <- function(df, rgx_pattern = NULL, col_idx = NULL,...){
if(max(col_idx) > ncol(df)){
col_idx <- min(col_idx):ncol(df)
}
o <- lapply(col_idx, function(i){
parts <- stri_split_regex(df[[i]], rgx_pattern, simplify = T)
col_name <- unique(parts[,1])
new_dat <- parts[,2]
colnames(df)[[i]] <<- col_name
df[[i]] <<- new_dat
})
return(df)
}
> df
id characteristics_ch1 characteristics_ch1.1 characteristics_ch1.2
1 1 gender: Female Thing One: a age: 60
2 2 gender: Male Thing One: a age: 45
3 3 gender: Female Thing One: a age: 63
4 4 gender: Male Thing One: b age: 56
5 5 gender: Female Thing One: b age: 65
> rename.df_cols(df = df, col_idx = 2:4, rgx_pattern = "(\s+)?\:(\s+)?")
id gender Thing One age
1 1 Female a 60
2 2 Male a 45
3 3 Female a 63
4 4 Male b 56
5 5 Female b 65
这就是你要找的吗?
用管道编辑:
> df %>% rename.df_cols(rgx_pattern = "(\s+)?\:(\s+)?", col_idx = 2:5)
id gender Thing One age
1 1 Female a 60
2 2 Male a 45
3 3 Female a 63
4 4 Male b 56
5 5 Female b 65
我们可以gather
数据框到long-format,separate
值列:
,然后spread
数据框回到wide-format.
library(tidyverse)
df2 <- df %>%
gather(Column, Value, -id) %>%
separate(Value, into = c("New_Column", "Value"), sep = ": ") %>%
select(-Column) %>%
spread(New_Column, Value, convert = TRUE)
df2
# id age gender Thing One
# 1 1 60 Female a
# 2 2 45 Male a
# 3 3 63 Female a
# 4 4 56 Male b
# 5 5 65 Female b
我正在尝试获取列中的值以设置为列名。冒号前的字符应该是列名。
df = cbind.data.frame(
id = c(1, 2 ,3, 4, 5),
characteristics_ch1 = c("gender: Female", "gender: Male", "gender: Female", "gender: Male", "gender: Female"),
characteristics_ch1.1 = c("Thing One: a", "Thing One: a", "Thing One: a", "Thing One: b", "Thing One: b"),
characteristics_ch1.2 = c("age: 60", "age: 45", "age: 63", "age: 56", "age: 65"))
对于第 2-5 列,我想删除 "gender: "、"Thing One: " 和 "age: ",使它们成为各自列的名称。
生成的数据框将是:
Result = cbind.data.frame(
id = c(1, 2 ,3, 4, 5),
gender = c("Female", "Male", "Female", "Male", "Female"),
`Thing One` = c("a", "a", "a", "b", "b"),
age = c("60", "45", "63", "56", "65")
)
为此,我 运行 以下函数:
re_col = function(i){
new_name = str_split_fixed(i, ": ", 2)[1]
return(assign(new_name, str_split_fixed(i, ": ", 2)[,2]))
}
通过以下应用函数:
plyr::colwise(re_col)(df)
#and
purrr::map(df, re_col)
没有成功。
可能还有更好的方法。我最初尝试编写一个可以与 dplyr 在数据清理中作为 %>% 步骤一起使用的函数,但没有成功。
一种变通方法,使用 stringi
通过提供给任何指定列的正则表达式模式拆分 data-values
rename.df_cols <- function(df, rgx_pattern = NULL, col_idx = NULL,...){
if(max(col_idx) > ncol(df)){
col_idx <- min(col_idx):ncol(df)
}
o <- lapply(col_idx, function(i){
parts <- stri_split_regex(df[[i]], rgx_pattern, simplify = T)
col_name <- unique(parts[,1])
new_dat <- parts[,2]
colnames(df)[[i]] <<- col_name
df[[i]] <<- new_dat
})
return(df)
}
> df
id characteristics_ch1 characteristics_ch1.1 characteristics_ch1.2
1 1 gender: Female Thing One: a age: 60
2 2 gender: Male Thing One: a age: 45
3 3 gender: Female Thing One: a age: 63
4 4 gender: Male Thing One: b age: 56
5 5 gender: Female Thing One: b age: 65
> rename.df_cols(df = df, col_idx = 2:4, rgx_pattern = "(\s+)?\:(\s+)?")
id gender Thing One age
1 1 Female a 60
2 2 Male a 45
3 3 Female a 63
4 4 Male b 56
5 5 Female b 65
这就是你要找的吗?
用管道编辑:
> df %>% rename.df_cols(rgx_pattern = "(\s+)?\:(\s+)?", col_idx = 2:5)
id gender Thing One age
1 1 Female a 60
2 2 Male a 45
3 3 Female a 63
4 4 Male b 56
5 5 Female b 65
我们可以gather
数据框到long-format,separate
值列:
,然后spread
数据框回到wide-format.
library(tidyverse)
df2 <- df %>%
gather(Column, Value, -id) %>%
separate(Value, into = c("New_Column", "Value"), sep = ": ") %>%
select(-Column) %>%
spread(New_Column, Value, convert = TRUE)
df2
# id age gender Thing One
# 1 1 60 Female a
# 2 2 45 Male a
# 3 3 63 Female a
# 4 4 56 Male b
# 5 5 65 Female b