将值从宽更改为长:1) Group_By、2) Spread/Dcast
Changing Values from Wide to Long: 1) Group_By, 2) Spread/Dcast
我有一个 phone 号码的名称列表,我想按名称对其进行分组,并将它们从长格式转换为宽格式,并填充 phone 号码跨列
Name Phone_Number
John Doe 0123456
John Doe 0123457
John Doe 0123458
Jim Doe 0123459
Jim Doe 0123450
Jane Doe 0123451
Jill Doe 0123457
Name Phone_Number1 Phone_Number2 Phone_Number3
John Doe 0123456 0123457 0123458
Jim Doe 0123459 0123450 NA
Jane Doe 0123451 NA NA
Jill Doe NA NA NA
library(dplyr)
library(tidyr)
library(data.table)
df <- data.frame(Name = c("John Doe", "John Doe", "John Doe", "Jim Doe", "Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number = c("0123456", "0123457","0123458", "0123459", "0123450","0123451", NA))
df1 <- data.frame(Name = c("John Doe","Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number1 = c("0123456", "0123459", "0123451", NA),
Phone_Number2 = c("0123457", "0123450", NA, NA),
Phone_Number3 = c("0123458", NA, NA, NA))
我已经尝试了一系列排列,但我做错的只是没有点击。我猜这与如何正确指定它们 key/value 对有关。我得到的最接近的是下面的代码:
tidyr::spread
df %>%
group_by(Name) %>%
mutate(id = row_number()) %>%
spread(Name, Phone_Number) %>%
select(-id)
data.table::dcast
df%>%
dcast(Name + Phone_Number ~ Phone_Number, value.var = "Phone_Number")
您不想添加行号(整个数据的索引),而是使用辅助函数 n()
添加组索引,它表示 grouped_df
。那么传播应该会很顺利...
df %>% group_by(Name) %>%
mutate(group_index = 1:n() %>% paste0("phone_", .)) %>%
spread(group_index, Phone_Number)
# A tibble: 4 x 4
# Groups: Name [4]
Name phone_1 phone_2 phone_3
<fctr> <fctr> <fctr> <fctr>
1 Jane Doe 0123451 <NA> <NA>
2 Jill Doe <NA> <NA> <NA>
3 Jim Doe 0123459 0123450 <NA>
4 John Doe 0123456 0123457 0123458
通过 Name
创建一个 rowid
,这就足够了
library(dplyr)
library(tidyr)
library(data.table)
df <- setDT(data.frame(Name = c("John Doe", "John Doe", "John Doe", "Jim Doe", "Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number = c("0123456", "0123457","0123458", "0123459", "0123450","0123451", NA)))
df1 <- data.frame(Name = c("John Doe","Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number1 = c("0123456", "0123459", "0123451", NA),
Phone_Number2 = c("0123457", "0123450", NA, NA),
Phone_Number3 = c("0123458", NA, NA, NA))
df[, rowid := rowid(Name)]
dcast.data.table(df, Name ~ rowid, value.var = "Phone_Number")
Name 1 2 3
1: Jane Doe 0123451 NA NA
2: Jill Doe NA NA NA
3: Jim Doe 0123459 0123450 NA
4: John Doe 0123456 0123457 0123458
正如评论中指出的那样,无需为任务创建 rowdi
变量。可以这样做,代码更简洁
df <- setDT(data.frame(Name = c("John Doe", "John Doe", "John Doe", "Jim Doe", "Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number = c("0123456", "0123457","0123458", "0123459", "0123450","0123451", NA)))
dcast.data.table(df, Name ~ paste0("Phone_Number", rowid(Name)),
value.var = "Phone_Number")
Name Phone_Number1 Phone_Number2 Phone_Number3
1: Jane Doe 0123451 NA NA
2: Jill Doe NA NA NA
3: Jim Doe 0123459 0123450 NA
4: John Doe 0123456 0123457 0123458
为了完整起见,rowid()
函数有一个prefix
参数,给出了一个简洁的解决方案:
library(data.table)
dcast(setDT(df), Name ~ rowid(Name, prefix = "Phone_Number"))
Name Phone_Number1 Phone_Number2 Phone_Number3
1: Jane Doe 0123451 <NA> <NA>
2: Jill Doe <NA> <NA> <NA>
3: Jim Doe 0123459 0123450 <NA>
4: John Doe 0123456 0123457 0123458
我有一个 phone 号码的名称列表,我想按名称对其进行分组,并将它们从长格式转换为宽格式,并填充 phone 号码跨列
Name Phone_Number John Doe 0123456 John Doe 0123457 John Doe 0123458 Jim Doe 0123459 Jim Doe 0123450 Jane Doe 0123451 Jill Doe 0123457 Name Phone_Number1 Phone_Number2 Phone_Number3 John Doe 0123456 0123457 0123458 Jim Doe 0123459 0123450 NA Jane Doe 0123451 NA NA Jill Doe NA NA NA
library(dplyr)
library(tidyr)
library(data.table)
df <- data.frame(Name = c("John Doe", "John Doe", "John Doe", "Jim Doe", "Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number = c("0123456", "0123457","0123458", "0123459", "0123450","0123451", NA))
df1 <- data.frame(Name = c("John Doe","Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number1 = c("0123456", "0123459", "0123451", NA),
Phone_Number2 = c("0123457", "0123450", NA, NA),
Phone_Number3 = c("0123458", NA, NA, NA))
我已经尝试了一系列排列,但我做错的只是没有点击。我猜这与如何正确指定它们 key/value 对有关。我得到的最接近的是下面的代码:
tidyr::spread
df %>%
group_by(Name) %>%
mutate(id = row_number()) %>%
spread(Name, Phone_Number) %>%
select(-id)
data.table::dcast
df%>%
dcast(Name + Phone_Number ~ Phone_Number, value.var = "Phone_Number")
您不想添加行号(整个数据的索引),而是使用辅助函数 n()
添加组索引,它表示 grouped_df
。那么传播应该会很顺利...
df %>% group_by(Name) %>%
mutate(group_index = 1:n() %>% paste0("phone_", .)) %>%
spread(group_index, Phone_Number)
# A tibble: 4 x 4
# Groups: Name [4]
Name phone_1 phone_2 phone_3
<fctr> <fctr> <fctr> <fctr>
1 Jane Doe 0123451 <NA> <NA>
2 Jill Doe <NA> <NA> <NA>
3 Jim Doe 0123459 0123450 <NA>
4 John Doe 0123456 0123457 0123458
通过 Name
创建一个 rowid
,这就足够了
library(dplyr)
library(tidyr)
library(data.table)
df <- setDT(data.frame(Name = c("John Doe", "John Doe", "John Doe", "Jim Doe", "Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number = c("0123456", "0123457","0123458", "0123459", "0123450","0123451", NA)))
df1 <- data.frame(Name = c("John Doe","Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number1 = c("0123456", "0123459", "0123451", NA),
Phone_Number2 = c("0123457", "0123450", NA, NA),
Phone_Number3 = c("0123458", NA, NA, NA))
df[, rowid := rowid(Name)]
dcast.data.table(df, Name ~ rowid, value.var = "Phone_Number")
Name 1 2 3
1: Jane Doe 0123451 NA NA
2: Jill Doe NA NA NA
3: Jim Doe 0123459 0123450 NA
4: John Doe 0123456 0123457 0123458
正如评论中指出的那样,无需为任务创建 rowdi
变量。可以这样做,代码更简洁
df <- setDT(data.frame(Name = c("John Doe", "John Doe", "John Doe", "Jim Doe", "Jim Doe", "Jane Doe", "Jill Doe" ),
Phone_Number = c("0123456", "0123457","0123458", "0123459", "0123450","0123451", NA)))
dcast.data.table(df, Name ~ paste0("Phone_Number", rowid(Name)),
value.var = "Phone_Number")
Name Phone_Number1 Phone_Number2 Phone_Number3
1: Jane Doe 0123451 NA NA
2: Jill Doe NA NA NA
3: Jim Doe 0123459 0123450 NA
4: John Doe 0123456 0123457 0123458
为了完整起见,rowid()
函数有一个prefix
参数,给出了一个简洁的解决方案:
library(data.table)
dcast(setDT(df), Name ~ rowid(Name, prefix = "Phone_Number"))
Name Phone_Number1 Phone_Number2 Phone_Number3 1: Jane Doe 0123451 <NA> <NA> 2: Jill Doe <NA> <NA> <NA> 3: Jim Doe 0123459 0123450 <NA> 4: John Doe 0123456 0123457 0123458