从 R 中的字符串中提取列
Extracting columns from strings in R
我有一些复杂的案例,我自己无法解析(请参阅下面的结构和输出)。 df 的样子:
每个参与者都有一个与之关联的字符串。我想从字符串中为每个参与者创建列。我看到这可以通过 tidyverse 包的 extract()
或 separate()
来完成。但是,问题是我首先想根据 ;
分隔符为每个 ID 创建额外的行。
所以中间步骤看起来像是每个 ID 都有多行数据。例如,ID=1 有 2 个分号,因此它应该有两行数据。基本上,;
之后的每个子字符串都属于一个新行。
之后,我想按 ,
将值分隔成列。
到目前为止,我正在努力完成第一步。
mandatory_puzzles %>% mutate(Mandatory = str_split(Mandatory, pattern = ";"))
这为我提供了每个 ID 的字符列表。
当我pull
必填栏时,我看到我可以实现一个中间步骤。
我该如何从这里开始?
这是数据:
structure(list(Mandatory = c("Manipulate environment objects , 0 , 285.54 , 270.81 , 51 , 7.18 , 43.82 , 6 , 3 , 0 , 1 , 59 , 25.67449;Manipulate environment objects , 1 , 542.94 , 255 , 320.02 , 28.26 , 291.76 , 18 , 2 , 2 , 5 , 55 , 24.58493;Manipulate environment objects , 2 , 856.8 , 311.39 , 124.58 , 21.45 , 103.13 , 15 , 3 , 0 , 2 , 36 , 26.28657;Manipulate environment objects , 3 , 1129.1 , 267.08 , 58.62 , 12.65 , 45.96 , 10 , 2 , 0 , 4 , 48 , 25.64844;Manipulate environment objects , 1 , 1226.98 , 89.83 , 65.93 , 18.71 , 47.21 , 13 , 1 , 0 , 5 , 16 , 23.01501",
"Manipulate environment objects , 0 , 282.28 , 254.48 , 96.46 , 7.23 , 89.23 , 7 , 2 , 1 , 1 , 47 , 60.15876;Manipulate environment objects , 1 , 457.42 , 170.06 , 66.17 , 7.99 , 58.18 , 10 , 2 , 0 , 4 , 39 , 59.98176;Manipulate environment objects , 2 , 947.78 , 484.34 , 640.47 , 29.06 , 611.41 , 29 , 3 , 3 , 1 , 25 , 60.04829;Manipulate environment objects , 3 , 1073.8 , 122.48 , 83.37 , 7.91 , 75.46 , 9 , 2 , 0 , 3 , 23 , 59.94974",
"Manipulate environment objects , 0 , 161.66 , 147.03 , 30.98 , 5.56 , 25.42 , 6 , 2 , 0 , 1 , 60 , 59.344;Manipulate environment objects , 1 , 293.38 , 126.06 , 42.92 , 11.66 , 31.26 , 9 , 2 , 0 , 4 , 52 , 59.47927;Manipulate environment objects , 2 , 548.26 , 252.67 , 221.09 , 36.73 , 184.36 , 33 , 3 , 1 , 1 , 46 , 59.70362;Manipulate environment objects , 3 , 682.66 , 132.26 , 43.8 , 7.34 , 36.45 , 8 , 2 , 0 , 3 , 52 , 62.20168;Manipulate environment objects , 1 , 818.18 , 123.68 , 14.31 , 3.15 , 11.15 , 5 , 1 , 0 , 4 , 52 , 59.62266;Manipulate environment objects , 2 , 976.86 , 155.28 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 25 , 61.21601;Manipulate environment objects , 2 , 1008.76 , 29.8 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 13 , 59.28677;Manipulate environment objects , 2 , 1019.34 , 8.43 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 2 , 59.81207",
"Manipulate environment objects , 1 , 59.06 , 48.72 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 27 , 75.03592;Manipulate environment objects , 1 , 93.14 , 27.5 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 6 , 74.96835;Manipulate environment objects , 1 , 223.1 , 125.77 , 60.6 , 6.03 , 54.57 , 8 , 2 , 0 , 2 , 37 , 74.98416;Manipulate environment objects , 2 , 485.24 , 258 , 141.3 , 17.31 , 124 , 20 , 3 , 0 , 5 , 34 , 74.94012;Manipulate environment objects , 3 , 644.2 , 156.72 , 74.81 , 12.27 , 62.54 , 11 , 2 , 0 , 1 , 36 , 74.96394",
"Manipulate environment objects , 1 , 2002.74 , 108.46 , 24.89 , 3.86 , 21.03 , 9 , 2 , 0 , 2 , 55 , 59.97829;Manipulate environment objects , 2 , 2249.74 , 244.39 , 94.38 , 10.67 , 83.71 , 21 , 3 , 1 , 5 , 43 , 60.01971;Manipulate environment objects , 3 , 2417.84 , 165.16 , 29.56 , 6.43 , 23.13 , 12 , 2 , 0 , 1 , 53 , 59.98882;Manipulate environment objects , 3 , 2481.99 , 61.77 , 9.57 , 2.28 , 7.29 , 5 , 1 , 0 , 2 , 16 , 60.0066;Manipulate environment objects , 1 , 2614.9 , 123.53 , 25.84 , 4.89 , 20.95 , 10 , 2 , 0 , 4 , 53 , 59.98421;Manipulate environment objects , 2 , 2818.64 , 201.51 , 76.99 , 12.27 , 64.72 , 26 , 4 , 1 , 2 , 44 , 60.03096;Manipulate environment objects , 3 , 2913.61 , 92.3 , 28.49 , 4.66 , 23.83 , 11 , 2 , 0 , 4 , 16 , 60.08723;Manipulate environment objects , 1 , 3039.14 , 116.06 , 50.32 , 11.83 , 38.48 , 22 , 2 , 0 , 6 , 53 , 59.97778;Manipulate environment objects , 2 , 3057.44 , 16.34 , 8.94 , 1.72 , 7.22 , 4 , 1 , 0 , 1 , 5 , 59.97208;Manipulate environment objects , 2 , 3217.52 , 153.71 , 37.96 , 4.72 , 33.25 , 14 , 3 , 1 , 4 , 43 , 60.01913;Manipulate environment objects , 3 , 3359.48 , 140.2 , 51.85 , 8.67 , 43.18 , 18 , 2 , 1 , 6 , 50 , 60.06129;Manipulate environment objects , 1 , 3480.78 , 114.93 , 23.72 , 4.31 , 19.41 , 13 , 2 , 0 , 2 , 52 , 59.95073;Manipulate environment objects , 2 , 3638.04 , 154.73 , 38.13 , 7.31 , 30.82 , 19 , 4 , 0 , 6 , 44 , 60.00352;Manipulate environment objects , 3 , 3764.88 , 124.92 , 25.51 , 6.16 , 19.35 , 12 , 2 , 0 , 2 , 51 , 60.01723;Manipulate environment objects , 1 , 3883.16 , 112.63 , 20.66 , 4.06 , 16.6 , 9 , 2 , 0 , 4 , 54 , 59.95847;Manipulate environment objects , 2 , 4025.9 , 140.86 , 68.49 , 13.15 , 55.33 , 24 , 3 , 1 , 1 , 43 , 59.98686;Manipulate environment ..."
), ID = 1:5), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
"data.frame"))
感谢您在这里提供的任何帮助。
我们可以用 separate_rows
来根据 ;
拆分成行,然后在 ,
处拆分成 separate
的列
library(dplyr)
library(tidyr)
library(stringr)
out <- mandatory_puzzles %>%
separate_rows( Mandatory, sep = "\s*;\s*")%>%
separate(Mandatory, into = str_c("col",
seq_len(str_count(first(.$Mandatory), ",") +1)),
sep = "\s*,\s*", convert = TRUE, extra = "merge")
我有一些复杂的案例,我自己无法解析(请参阅下面的结构和输出)。 df 的样子:
每个参与者都有一个与之关联的字符串。我想从字符串中为每个参与者创建列。我看到这可以通过 tidyverse 包的 extract()
或 separate()
来完成。但是,问题是我首先想根据 ;
分隔符为每个 ID 创建额外的行。
所以中间步骤看起来像是每个 ID 都有多行数据。例如,ID=1 有 2 个分号,因此它应该有两行数据。基本上,;
之后的每个子字符串都属于一个新行。
之后,我想按 ,
将值分隔成列。
到目前为止,我正在努力完成第一步。
mandatory_puzzles %>% mutate(Mandatory = str_split(Mandatory, pattern = ";"))
这为我提供了每个 ID 的字符列表。
当我pull
必填栏时,我看到我可以实现一个中间步骤。
我该如何从这里开始?
这是数据:
structure(list(Mandatory = c("Manipulate environment objects , 0 , 285.54 , 270.81 , 51 , 7.18 , 43.82 , 6 , 3 , 0 , 1 , 59 , 25.67449;Manipulate environment objects , 1 , 542.94 , 255 , 320.02 , 28.26 , 291.76 , 18 , 2 , 2 , 5 , 55 , 24.58493;Manipulate environment objects , 2 , 856.8 , 311.39 , 124.58 , 21.45 , 103.13 , 15 , 3 , 0 , 2 , 36 , 26.28657;Manipulate environment objects , 3 , 1129.1 , 267.08 , 58.62 , 12.65 , 45.96 , 10 , 2 , 0 , 4 , 48 , 25.64844;Manipulate environment objects , 1 , 1226.98 , 89.83 , 65.93 , 18.71 , 47.21 , 13 , 1 , 0 , 5 , 16 , 23.01501",
"Manipulate environment objects , 0 , 282.28 , 254.48 , 96.46 , 7.23 , 89.23 , 7 , 2 , 1 , 1 , 47 , 60.15876;Manipulate environment objects , 1 , 457.42 , 170.06 , 66.17 , 7.99 , 58.18 , 10 , 2 , 0 , 4 , 39 , 59.98176;Manipulate environment objects , 2 , 947.78 , 484.34 , 640.47 , 29.06 , 611.41 , 29 , 3 , 3 , 1 , 25 , 60.04829;Manipulate environment objects , 3 , 1073.8 , 122.48 , 83.37 , 7.91 , 75.46 , 9 , 2 , 0 , 3 , 23 , 59.94974",
"Manipulate environment objects , 0 , 161.66 , 147.03 , 30.98 , 5.56 , 25.42 , 6 , 2 , 0 , 1 , 60 , 59.344;Manipulate environment objects , 1 , 293.38 , 126.06 , 42.92 , 11.66 , 31.26 , 9 , 2 , 0 , 4 , 52 , 59.47927;Manipulate environment objects , 2 , 548.26 , 252.67 , 221.09 , 36.73 , 184.36 , 33 , 3 , 1 , 1 , 46 , 59.70362;Manipulate environment objects , 3 , 682.66 , 132.26 , 43.8 , 7.34 , 36.45 , 8 , 2 , 0 , 3 , 52 , 62.20168;Manipulate environment objects , 1 , 818.18 , 123.68 , 14.31 , 3.15 , 11.15 , 5 , 1 , 0 , 4 , 52 , 59.62266;Manipulate environment objects , 2 , 976.86 , 155.28 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 25 , 61.21601;Manipulate environment objects , 2 , 1008.76 , 29.8 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 13 , 59.28677;Manipulate environment objects , 2 , 1019.34 , 8.43 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 2 , 59.81207",
"Manipulate environment objects , 1 , 59.06 , 48.72 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 27 , 75.03592;Manipulate environment objects , 1 , 93.14 , 27.5 , 0 , 0 , 0 , 0 , 0 , 0 , Puzzle Not Solved , 6 , 74.96835;Manipulate environment objects , 1 , 223.1 , 125.77 , 60.6 , 6.03 , 54.57 , 8 , 2 , 0 , 2 , 37 , 74.98416;Manipulate environment objects , 2 , 485.24 , 258 , 141.3 , 17.31 , 124 , 20 , 3 , 0 , 5 , 34 , 74.94012;Manipulate environment objects , 3 , 644.2 , 156.72 , 74.81 , 12.27 , 62.54 , 11 , 2 , 0 , 1 , 36 , 74.96394",
"Manipulate environment objects , 1 , 2002.74 , 108.46 , 24.89 , 3.86 , 21.03 , 9 , 2 , 0 , 2 , 55 , 59.97829;Manipulate environment objects , 2 , 2249.74 , 244.39 , 94.38 , 10.67 , 83.71 , 21 , 3 , 1 , 5 , 43 , 60.01971;Manipulate environment objects , 3 , 2417.84 , 165.16 , 29.56 , 6.43 , 23.13 , 12 , 2 , 0 , 1 , 53 , 59.98882;Manipulate environment objects , 3 , 2481.99 , 61.77 , 9.57 , 2.28 , 7.29 , 5 , 1 , 0 , 2 , 16 , 60.0066;Manipulate environment objects , 1 , 2614.9 , 123.53 , 25.84 , 4.89 , 20.95 , 10 , 2 , 0 , 4 , 53 , 59.98421;Manipulate environment objects , 2 , 2818.64 , 201.51 , 76.99 , 12.27 , 64.72 , 26 , 4 , 1 , 2 , 44 , 60.03096;Manipulate environment objects , 3 , 2913.61 , 92.3 , 28.49 , 4.66 , 23.83 , 11 , 2 , 0 , 4 , 16 , 60.08723;Manipulate environment objects , 1 , 3039.14 , 116.06 , 50.32 , 11.83 , 38.48 , 22 , 2 , 0 , 6 , 53 , 59.97778;Manipulate environment objects , 2 , 3057.44 , 16.34 , 8.94 , 1.72 , 7.22 , 4 , 1 , 0 , 1 , 5 , 59.97208;Manipulate environment objects , 2 , 3217.52 , 153.71 , 37.96 , 4.72 , 33.25 , 14 , 3 , 1 , 4 , 43 , 60.01913;Manipulate environment objects , 3 , 3359.48 , 140.2 , 51.85 , 8.67 , 43.18 , 18 , 2 , 1 , 6 , 50 , 60.06129;Manipulate environment objects , 1 , 3480.78 , 114.93 , 23.72 , 4.31 , 19.41 , 13 , 2 , 0 , 2 , 52 , 59.95073;Manipulate environment objects , 2 , 3638.04 , 154.73 , 38.13 , 7.31 , 30.82 , 19 , 4 , 0 , 6 , 44 , 60.00352;Manipulate environment objects , 3 , 3764.88 , 124.92 , 25.51 , 6.16 , 19.35 , 12 , 2 , 0 , 2 , 51 , 60.01723;Manipulate environment objects , 1 , 3883.16 , 112.63 , 20.66 , 4.06 , 16.6 , 9 , 2 , 0 , 4 , 54 , 59.95847;Manipulate environment objects , 2 , 4025.9 , 140.86 , 68.49 , 13.15 , 55.33 , 24 , 3 , 1 , 1 , 43 , 59.98686;Manipulate environment ..."
), ID = 1:5), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
"data.frame"))
感谢您在这里提供的任何帮助。
我们可以用 separate_rows
来根据 ;
拆分成行,然后在 ,
处拆分成 separate
library(dplyr)
library(tidyr)
library(stringr)
out <- mandatory_puzzles %>%
separate_rows( Mandatory, sep = "\s*;\s*")%>%
separate(Mandatory, into = str_c("col",
seq_len(str_count(first(.$Mandatory), ",") +1)),
sep = "\s*,\s*", convert = TRUE, extra = "merge")