replace/substitute bash 中文件名列表中的子字符串列表（使用 awk？）

Question

我发现在我的超过 100.000 个文件名中，我的分隔符 _ 也出现在意想不到的位置，并扰乱了处理过程。因此，我想替换那些文件中的 _ 。它们都在一个文件夹中。我尝试使用 awk FILENAME 变量，但我不知道如何解决它以更改文件名本身。完整的文件名例如

mg.reads.per.gene_Putative@polyhydroxyalkanoic@acid@system@protein@(PHA_gran_rgn)_A1.tsv   
mg.reads.per.gene_Phage@regulatory@protein@Rha@(Phage_pRha)_A1.tsv 
...

一般来说，第一个和最后一个_应该是有的，所有额外的都应该被替换。注意：额外的并不总是在括号中。我在名为 problems.txt:

的文件名中生成了一个包含那些有问题的子字符串的列表

Putative@polyhydroxyalkanoic@acid@system@protein@(PHA_gran_rgn)
Phage@regulatory@protein@Rha@(Phage_pRha)
Phage@tail@protein@(Tail_P2_I)
Phd_YefM
pheT_bact:@phenylalanine--tRNA@ligase%2C@beta@subunit
...

并且在这里也想使用 @ 作为不常见的字符来获得：

mg.reads.per.gene_Putative@polyhydroxyalkanoic@acid@system@protein@(PHA@gran@rgn)_A1.tsv    
mg.reads.per.gene_Phage@regulatory@protein@Rha@(Phage@pRha)_A1.tsv 
...

如何使用此列表作为输入来仅更改与列表中的记录匹配的文件名？我试过这个来解决文件夹中的文件并更改文件名的一部分（awk 伪代码）：

for sample_files in $(find . -mindepth 1 -maxdepth 1 -type f)
do  
  awk '{if ("problem_record" ~ FILENAME); 
  gsub(/_/,/@/, substring(FILENAME))); print}' problems.txt $sample_files > $sample_files
done

但我不能指定我只想要 "problem_record" 条目所涵盖区域内的更改。我也不知道如何指定输出

Answer 1

这是一个纯粹的 bash 解决方案：

#!/bin/bash

# Loop over all files in the current directory
for i in *; do

  # Extract the part before the first _
  head="${i%%_*}"

  # Get the rest of the string
  tail="${i#*_}"

  # Extract the part after the last _
  rhead="${tail##*_}"

  # Extract the "middle" portion
  rtail="${tail%_*}"

  # Substitute _ with @ in the "middle"
  fixedrtail="${rtail//_/@}"

  # Rename files
  #echo -e "Renaming \"$i\" to \"$head_${fixedrtail}_$rhead\""
  mv $i "${head}_${fixedrtail}_${rhead}"
done

这将获取当前目录中的所有文件并重命名它们，以便所有 _，除了第一个和最后一个被替换为 @。它使用了很多参数扩展，你可以 read about here.

replace/substitute bash 中文件名列表中的子字符串列表（使用 awk？）

replace/substitute a list of substrings in a list of filenames in bash (with awk?)

bash

awk

filenames

substring

substitution