AWK打印字符串+bash变量+字符串的组合
AWK print combination of string+bash variable+string
我正在尝试使用 awk 将 fasta 文件中的重叠群重命名为隔离 ID 并将重叠群从 1 编号为 n。
法斯塔文件:
>NODE_1_length_172477_cov_46.1343
GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA
隔离 ID 是一个变量,因为我想为多个文件执行此操作。我已经达到打印 isolateIDnumber 的程度,但我需要 >isolateID_number
for file in /dir/*.fasta
do
name=$(basename "$file" .fasta)
awk '/^>/{print "'"$name"'" ++i; next}{print}' $file > rename.fasta
done;
这给了我:
15AR07771
GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA
期望的输出:
>15AR0777_1
GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA
问题是,我应该把字符串放在哪里才能打印 >15AR0777_1 而不是 15AR07771
我尝试了以下几种变体,但 none 奏效了
awk '/^>/{print ">'"$name"'" "_" ++i; next}{print}' $file > rename.fasta
awk '/^>/{print ">'"$name"'" _++i; next}{print}' $file > rename.fasta
谢谢!
使用 awk -v awk_var="$bash_bar"
将 shell 变量传输到 awk 脚本中。 man awk:
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the program begins. Such variable values are available to the
BEGIN rule of an AWK program.
即:
for file in dir/*.fasta
do
name=$(basename "$file" .fasta)
awk -v name="$name" '/^>/{print ">" name "_" ++i; next}{print}' $file > rename.fasta
done
这是它的全 awk 版本:
awk '
FNR==1 { # new file, close old and make name for new
close(f) # close the old output file
n=FILENAME # get filename of the new file
gsub(/^.*\/|\.fasta$/,"",n) # remove path and .fasta
f="rename_" n ".fasta" # new output file
}
/^>/ {
[=12=]=">" n "_" ++i # >name_number
}
{
print > f # print to output file
}' dir/*.fasta # process .fasta files in dir
如果有文件 dir/15AR07771.fasta
,脚本将生成它的文件 ./rename_15AR07771.fasta
。 (您的版本将所有输出文件写入 rename.fasta
,甚至不追加,您可能需要修复它。)
我正在尝试使用 awk 将 fasta 文件中的重叠群重命名为隔离 ID 并将重叠群从 1 编号为 n。
法斯塔文件:
>NODE_1_length_172477_cov_46.1343
GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA
隔离 ID 是一个变量,因为我想为多个文件执行此操作。我已经达到打印 isolateIDnumber 的程度,但我需要 >isolateID_number
for file in /dir/*.fasta
do
name=$(basename "$file" .fasta)
awk '/^>/{print "'"$name"'" ++i; next}{print}' $file > rename.fasta
done;
这给了我:
15AR07771
GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA
期望的输出:
>15AR0777_1
GCAGGGCGCAGTTTTTGGAGGCTTGGCAAACCCGTGAGGGAAATTTGGCAGGCAAAATTT
TGGCGGTCGTGCCGAAAAAAGCGGAGGCGATTTCAAATAAATTGTTTTTCACACATCATC
CCAAGCGGCAGACGGAGTTTGCAGTCGGACAAATCAGGCAAGGGCGCGCAGAGTAAGTCA
问题是,我应该把字符串放在哪里才能打印 >15AR0777_1 而不是 15AR07771
我尝试了以下几种变体,但 none 奏效了
awk '/^>/{print ">'"$name"'" "_" ++i; next}{print}' $file > rename.fasta
awk '/^>/{print ">'"$name"'" _++i; next}{print}' $file > rename.fasta
谢谢!
使用 awk -v awk_var="$bash_bar"
将 shell 变量传输到 awk 脚本中。 man awk:
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the program begins. Such variable values are available to the
BEGIN rule of an AWK program.
即:
for file in dir/*.fasta
do
name=$(basename "$file" .fasta)
awk -v name="$name" '/^>/{print ">" name "_" ++i; next}{print}' $file > rename.fasta
done
这是它的全 awk 版本:
awk '
FNR==1 { # new file, close old and make name for new
close(f) # close the old output file
n=FILENAME # get filename of the new file
gsub(/^.*\/|\.fasta$/,"",n) # remove path and .fasta
f="rename_" n ".fasta" # new output file
}
/^>/ {
[=12=]=">" n "_" ++i # >name_number
}
{
print > f # print to output file
}' dir/*.fasta # process .fasta files in dir
如果有文件 dir/15AR07771.fasta
,脚本将生成它的文件 ./rename_15AR07771.fasta
。 (您的版本将所有输出文件写入 rename.fasta
,甚至不追加,您可能需要修复它。)