创建新的连接子字符串列
create new column of concatenated substrings
对于菜鸟问题深表歉意!
我希望能够使用 mutate 函数和 dplyr/stringr 的一些其他组合来创建一个新列,以从“文件”列中提取文本的子字符串并制作“图像”列,如下所示输出如下:
test<- data.frame(File= c("4301 TMA_Scan1_Core[1,2,A]_[10673,40057]_component_data.tif", "TA3150Scan1_Core[1,3,A][7006,42110]_component_data.tif"))
testoutput<- data.frame(File= c("4301 TMA_Scan1_Core[1,2,A]_[10673,40057]_component_data.tif", "TA3150Scan1_Core[1,3,A][7006,42110]_component_data.tif"),
Image = c("TA4301-2A", "TA3150-3A"))
非常感谢!
这是你想要的吗?
test %>%
mutate(Image = sub("^\D*(\d+)[^][,]+\[\w+,(\w+),(\w+)\].+", "TA\1-\2\3", File))
输出
File Image
1 4301 TMA_Scan1_Core[1,2,A]_[10673,40057]_component_data.tif TA4301-2A
2 TA3150Scan1_Core[1,3,A][7006,42110]_component_data.tif TA3150-3A
从左到右,
1. Match zero or more non-digit characters from the beginning
2. Match one or more digits; set it as the first capturing group
3. Match one or characters that are not "]", "[", or ","
4. Match the three values inside square brackets; set the last two as second and third capturing groups
5. Match remaining characters
^\D* (\d+ ) [^][,]+ \[\w+,(\w+),(\w+)\] .+
TA 3150 Scan1_Core [ 1 , 3 , A ] [7006,42110]_component_data.tif
4301 TMA_Scan1_Core [ 1 , 2 , A ] _[10673,40057]_component_data.tif
对于菜鸟问题深表歉意!
我希望能够使用 mutate 函数和 dplyr/stringr 的一些其他组合来创建一个新列,以从“文件”列中提取文本的子字符串并制作“图像”列,如下所示输出如下:
test<- data.frame(File= c("4301 TMA_Scan1_Core[1,2,A]_[10673,40057]_component_data.tif", "TA3150Scan1_Core[1,3,A][7006,42110]_component_data.tif"))
testoutput<- data.frame(File= c("4301 TMA_Scan1_Core[1,2,A]_[10673,40057]_component_data.tif", "TA3150Scan1_Core[1,3,A][7006,42110]_component_data.tif"),
Image = c("TA4301-2A", "TA3150-3A"))
非常感谢!
这是你想要的吗?
test %>%
mutate(Image = sub("^\D*(\d+)[^][,]+\[\w+,(\w+),(\w+)\].+", "TA\1-\2\3", File))
输出
File Image
1 4301 TMA_Scan1_Core[1,2,A]_[10673,40057]_component_data.tif TA4301-2A
2 TA3150Scan1_Core[1,3,A][7006,42110]_component_data.tif TA3150-3A
从左到右,
1. Match zero or more non-digit characters from the beginning
2. Match one or more digits; set it as the first capturing group
3. Match one or characters that are not "]", "[", or ","
4. Match the three values inside square brackets; set the last two as second and third capturing groups
5. Match remaining characters
^\D* (\d+ ) [^][,]+ \[\w+,(\w+),(\w+)\] .+
TA 3150 Scan1_Core [ 1 , 3 , A ] [7006,42110]_component_data.tif
4301 TMA_Scan1_Core [ 1 , 2 , A ] _[10673,40057]_component_data.tif