AWK打印字符串+bash变量+字符串的组合

AWK print combination of string+bash variable+string

我正在尝试使用 awk 将 fasta 文件中的重叠群重命名为隔离 ID 并将重叠群从 1 编号为 n。

法斯塔文件:

  >NODE_1_length_172477_cov_46.1343
  GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
  TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
  CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA

隔离 ID 是一个变量,因为我想为多个文件执行此操作。我已经达到打印 isolateIDnumber 的程度,但我需要 >isolateID_number

    for file in /dir/*.fasta
    do
        name=$(basename "$file" .fasta)
        awk '/^>/{print "'"$name"'" ++i; next}{print}' $file > rename.fasta
    done;

这给了我:

 15AR07771
 GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
 TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
 CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA

期望的输出:

 >15AR0777_1
 GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
 TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
 CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA

问题是,我应该把字符串放在哪里才能打印 >15AR0777_1 而不是 15AR07771

我尝试了以下几种变体,但 none 奏效了

  awk '/^>/{print ">'"$name"'" "_" ++i; next}{print}' $file > rename.fasta
  awk '/^>/{print ">'"$name"'" _++i; next}{print}' $file > rename.fasta

谢谢!

使用 awk -v awk_var="$bash_bar" 将 shell 变量传输到 awk 脚本中。 man awk:

-v var=val
--assign var=val
       Assign the value val to the variable var, before execution of the program begins.  Such variable values are available to the
       BEGIN rule of an AWK program.

即:

for file in dir/*.fasta
do         
    name=$(basename "$file" .fasta)
    awk -v name="$name" '/^>/{print ">" name "_" ++i; next}{print}' $file > rename.fasta
done

这是它的全 awk 版本:

awk '
FNR==1 {                         # new file, close old and make name for new
    close(f)                     # close the old output file
    n=FILENAME                   # get filename of the new file
    gsub(/^.*\/|\.fasta$/,"",n)  # remove path and .fasta
    f="rename_" n ".fasta"       # new output file
}
/^>/ {
    [=12=]=">" n "_" ++i             # >name_number
}
{
    print > f                    # print to output file
}' dir/*.fasta                   # process .fasta files in dir

如果有文件 dir/15AR07771.fasta,脚本将生成它的文件 ./rename_15AR07771.fasta。 (您的版本将所有输出文件写入 rename.fasta,甚至不追加,您可能需要修复它。)