R 数据帧转换:将字符观察拆分为多行,重新排列字符串
R dataframe transformation: split character observations into multiple rows, rearrange strings
我有一个数据框,其中一列填充了结构如下的字符串:
姓氏,名字 XX,姓氏,名字 XX,等等。因此,名字组合在末尾用“XX”分开。
我正在寻找
- 将姓氏、名字的每个组合放在单独的行中;
- 将每个名字组合转换为名字姓氏。
这看起来如下:
example <- data.frame(id = c(1,2,3),
names = c("Russell-Moyle, Lloyd XX, Lucas, Caroline XX, Hobhouse, Wera XX", "Benn, Hilary XX, Sobel, Alex XX, West, Catherine XX, Doughty, Stephen XX", "Oswald, Kirsten XX, Thompson, Owen XX, Dorans, Allan XX")
)
example
#current output:
#1 1 Russell-Moyle, Lloyd XX, Lucas, Caroline XX, Hobhouse, Wera XX
#2 2 Benn, Hilary XX, Sobel, Alex XX, West, Catherine XX, Doughty, Stephen XX
#3 3 Oswald, Kirsten XX, Thompson, Owen XX, Dorans, Allan XX
#ideal output:
id names
1 Lloyd Russel-Moyle
1 Caroline Lucas
1 Were Hobhouse
2 Hilary Benn
2 Alex Sobel
2 Catherine West
2 Stephan Doughty
3 Kirsten Oswald
3 Owen Thompson
3 Allan Dorans
有人能帮帮我吗?谢谢!!
您可以使用 tidyr
包中的一些函数来完成此操作。
library(tidyr)
library(dplyr)
example %>%
separate_rows(names, sep = "( *)XX(,*)( *)") %>% # create one row per name
separate(names, into = c("last", "first"), sep = ", ") %>% # separate names into first and last
unite(names, first, last, sep = " ")
# A tibble: 10 x 2
id names
<dbl> <chr>
1 1 Lloyd Russell-Moyle
2 1 Caroline Lucas
3 1 Wera Hobhouse
4 2 Hilary Benn
5 2 Alex Sobel
6 2 Catherine West
7 2 Stephen Doughty
8 3 Kirsten Oswald
9 3 Owen Thompson
10 3 Allan Dorans
这是 separate_rows()
的 sep =
参数中正则表达式的分解:
( *) # match a sequence starting with 0 or more spaces
XX # followed by XX
(,*) # followed by 0 or more commas
( *) # followed by 0 or more spaces
我有一个数据框,其中一列填充了结构如下的字符串: 姓氏,名字 XX,姓氏,名字 XX,等等。因此,名字组合在末尾用“XX”分开。
我正在寻找
- 将姓氏、名字的每个组合放在单独的行中;
- 将每个名字组合转换为名字姓氏。
这看起来如下:
example <- data.frame(id = c(1,2,3),
names = c("Russell-Moyle, Lloyd XX, Lucas, Caroline XX, Hobhouse, Wera XX", "Benn, Hilary XX, Sobel, Alex XX, West, Catherine XX, Doughty, Stephen XX", "Oswald, Kirsten XX, Thompson, Owen XX, Dorans, Allan XX")
)
example
#current output:
#1 1 Russell-Moyle, Lloyd XX, Lucas, Caroline XX, Hobhouse, Wera XX
#2 2 Benn, Hilary XX, Sobel, Alex XX, West, Catherine XX, Doughty, Stephen XX
#3 3 Oswald, Kirsten XX, Thompson, Owen XX, Dorans, Allan XX
#ideal output:
id names
1 Lloyd Russel-Moyle
1 Caroline Lucas
1 Were Hobhouse
2 Hilary Benn
2 Alex Sobel
2 Catherine West
2 Stephan Doughty
3 Kirsten Oswald
3 Owen Thompson
3 Allan Dorans
有人能帮帮我吗?谢谢!!
您可以使用 tidyr
包中的一些函数来完成此操作。
library(tidyr)
library(dplyr)
example %>%
separate_rows(names, sep = "( *)XX(,*)( *)") %>% # create one row per name
separate(names, into = c("last", "first"), sep = ", ") %>% # separate names into first and last
unite(names, first, last, sep = " ")
# A tibble: 10 x 2
id names
<dbl> <chr>
1 1 Lloyd Russell-Moyle
2 1 Caroline Lucas
3 1 Wera Hobhouse
4 2 Hilary Benn
5 2 Alex Sobel
6 2 Catherine West
7 2 Stephen Doughty
8 3 Kirsten Oswald
9 3 Owen Thompson
10 3 Allan Dorans
这是 separate_rows()
的 sep =
参数中正则表达式的分解:
( *) # match a sequence starting with 0 or more spaces
XX # followed by XX
(,*) # followed by 0 or more commas
( *) # followed by 0 or more spaces