如何在awk中添加带有字符串变量的列

Question

在 linux 中，我想使用 for 循环从一组输入 txt 文件中生成每个新的 txt 文件，方法是选择两列（第 1 列和第 4 列），然后添加新的第 3 列、4 和 5 具有已定义的字符串变量。输入文件名之一是：

E2_NCAPG_r1_UCSC_DNA_exon_fraction_counts.txt

对于这个输入文件，我想要的输出文件是：

AluJb   165824  E2  DNA exon

AluSp   43328   E2  DNA exon

AluSc5  5753    E2  DNA exon

我试过了：

for file in `ls E2*.txt`; do 
  treat=`echo ${file} | cut -d'_' -f1` && 
  TE=`echo ${file} | cut -d'_' -f5` && 
  region=`echo ${file} | cut -d'_' -f6` && 
  awk 'BEGIN{OFS="\t"} {print ,,==treat,==TE,==region}' $file > ./E2_counts/${file}_tmp.txt
done

但是没有用。

感谢帮助！

Answer 1

$ ls -1
E2_NCAPG_r1_UCSC_DNA_exon_fraction_counts.txt
E2_counts

$ more E2_NCAPG_r1_UCSC_DNA_exon_fraction_counts.txt
AluJb 0 0 165824
AluSp 0 0 43328
AluSc5 0 0 5753

$ for file in $(ls E2*.txt); do awk -v treat=$(echo ${file} | cut -d'_' -f1) -v TE=$(echo ${file} | cut -d'_' -f5) -v region=$(echo ${file} | cut -d'_' -f6) 'BEGIN{OFS="\t"} {print ,,treat,TE,region}' $file > ./E2_counts/${file}_tmp.txt; done

$ more E2_counts/E2_NCAPG_r1_UCSC_DNA_exon_fraction_counts.txt_tmp.txt
AluJb   165824  E2      DNA     exon
AluSp   43328   E2      DNA     exon
AluSc5  5753    E2      DNA     exon

Answer 2

请您尝试以下操作：

awk -v OFS="\t" '
FNR==1 {                                # executed once for each filename
    split(FILENAME, a, "_")             # split the filename into array "a" on "_"
}
{
    print , , a[1], a[5], a[6]      # print columns of the file and the filename
}' E2*.txt > E2_counts

如何在awk中添加带有字符串变量的列

how to add columns with string variables in awk

awk