shell脚本:合并文件,如何在每行awk输出的开头添加一些字符

shall script: merge files, how to add some characters at the beginning of each line of awk output

我正在尝试使用 bash 脚本将数百个包含物种名称和比例的样本文件合并到一个长格式文件中。我想知道如何在每行 awk 输出的开头添加一些字符。

我有一些保存在变量 $STEM 中的 sampleID。我使用 awk 从每个文件中获取物种名称和比例。比例在每行的开头;物种名称位于每行的末尾(第 6 位)(制表符分隔)。但我还想将 sampleID ($STEM) 添加到输出文件中每一行的开头。这是我的代码:

for file in $input_dir/*_species_abundance.txt
do
        STEM=$(basename "$file" _species_abundance.txt )
        echo "processing sample $STEM"
        awk '{print "$STEM," ,}' FS='\t' $file >> $input_dir/merged_species_abundance.txt

done

"$STEM," 部分没有按预期工作,因为当前输出是“$STEM”,而不是用 sampleID 替换它。

您对我如何修改我的代码有什么建议吗?提前致谢!

这是一些示例输入:

  0.45  124078  0       S       148633                s__Faecalibacterium prausnitzii_D
  0.35  95476   0       S       145938                s__Faecalibacterium prausnitzii_C
  0.21  57002   0       S       158191                s__Faecalibacterium prausnitzii_I
  0.18  49503   0       S       224832                s__Faecalibacterium sp900539945
  0.07  18991   0       S       157095                s__Faecalibacterium prausnitzii_G
  0.04  12007   0       S       187396                s__Faecalibacterium prausnitzii_F
...
... 

第一个数字是比例,最后一个字是种名

样本 ID 类似于 1001、1002、1003...

我想要的输出是(逗号分隔):

1001,0.45,s__Faecalibacterium prausnitzii_D
1001,0.35,s__Faecalibacterium prausnitzii_C
1001,0.21,s__Faecalibacterium prausnitzii_I
...
1002,0.28,s__Faecalibacterium prausnitzii_D
1002,0.00,s__Faecalibacterium prausnitzii_C
1002,0.01,s__Faecalibacterium prausnitzii_I
...
1003,0.60,s__Faecalibacterium prausnitzii_D
1003,0.02,s__Faecalibacterium prausnitzii_C
1003,0.39,s__Faecalibacterium prausnitzii_I
...
...

我想这就是您要找的:

input_dir=mydir;
for file in $input_dir/*_species_abundance.txt;
do
    STEM=$(basename "$file" _species_abundance.txt );
    echo "processing sample $STEM";
    awk '{print '$STEM' ","  ","  " " }' $file; >> $input_dir/merged_species_abundance.txt
done

打印 shell 环境变量 $STEM 值的关键是让 shell 通过“将其放在单引号外”来评估它,'。然后,awk 获取它的值。

这是生成的输出:

processing sample 1001
processing sample 1002
processing sample 2001
processing sample 2002
$ cat mydir/merged_species_abundance.txt
1001,0.45,s__Faecalibacterium prausnitzii_D
1001,0.35,s__Faecalibacterium prausnitzii_C
1001,0.21,s__Faecalibacterium prausnitzii_I
1001,0.18,s__Faecalibacterium sp900539945
1001,0.07,s__Faecalibacterium prausnitzii_G
1001,0.04,s__Faecalibacterium prausnitzii_F
1002,0.45,s__Faecalibacterium prausnitzii_D
1002,0.35,s__Faecalibacterium prausnitzii_C
1002,0.21,s__Faecalibacterium prausnitzii_I
1002,0.18,s__Faecalibacterium sp900539945
1002,0.07,s__Faecalibacterium prausnitzii_G
1002,0.04,s__Faecalibacterium prausnitzii_F