创建新的连接子字符串列

create new column of concatenated substrings

对于菜鸟问题​​深表歉意!

我希望能够使用 mutate 函数和 dplyr/stringr 的一些其他组合来创建一个新列,以从“文件”列中提取文本的子字符串并制作“图像”列,如下所示输出如下:

test<- data.frame(File= c("4301 TMA_Scan1_Core[1,2,A]_[10673,40057]_component_data.tif", "TA3150Scan1_Core[1,3,A][7006,42110]_component_data.tif"))

testoutput<- data.frame(File= c("4301 TMA_Scan1_Core[1,2,A]_[10673,40057]_component_data.tif", "TA3150Scan1_Core[1,3,A][7006,42110]_component_data.tif"),
                        Image = c("TA4301-2A", "TA3150-3A"))

非常感谢!

这是你想要的吗?

test %>% 
  mutate(Image = sub("^\D*(\d+)[^][,]+\[\w+,(\w+),(\w+)\].+", "TA\1-\2\3", File))

输出

                                                         File     Image
1 4301 TMA_Scan1_Core[1,2,A]_[10673,40057]_component_data.tif TA4301-2A
2      TA3150Scan1_Core[1,3,A][7006,42110]_component_data.tif TA3150-3A

从左到右,

1. Match zero or more non-digit characters from the beginning
2. Match one or more digits; set it as the first capturing group
3. Match one or characters that are not "]", "[", or ","
4. Match the three values inside square brackets; set the last two as second and third capturing groups
5. Match remaining characters

^\D*  (\d+  )  [^][,]+       \[\w+,(\w+),(\w+)\]   .+
   TA     3150   Scan1_Core      [  1 ,   3  ,   A    ]   [7006,42110]_component_data.tif
          4301   TMA_Scan1_Core  [  1 ,   2  ,   A    ]   _[10673,40057]_component_data.tif