如何 运行 循环中的 bash 脚本
How to run a bash script in a loop
我写了一个 bash 脚本来从两个输入文件中提取子字符串并将其保存到输出文件中,如下所示:
输入文件 1
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
输入文件 2
gene1 10 20
gene2 40 50
genen x y
我的脚本
>output_file
cat input_file2 | while read row; do
echo $row > temp
geneName=`awk '{print }' temp`
startPos=`awk '{print }' temp`
endPos=`awk '{print }' temp`
length=$(expr $endPos - $startPos)
for i in temp; do
echo ">${geneName}" >> genes_fasta
awk -v S=$startPos -v L=$length '{print substr([=14=],S,L)}' input_file1 >> output file
done
done
我怎样才能使它在输入文件 1 中的多个字符串循环工作?
新的输入文件如下所示:
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>genotype2
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
>genotypen...
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn...
我希望每个基因型都有一个不同的输出文件,文件名就是基因型名称。
谢谢你!
您能否尝试以下操作,我假设您的 Input_file1 以 >
开头的列应该与 Input_file2 第一列的第一列进行比较(由于示例令人困惑,因此基于 OP 的尝试,已将其写入)。
awk '
FNR==NR{
start_point[]=
end_point[]=
next
}
/^>/{
sub(/^>/,"")
val=[=10=]
next
}
{
print val ORS substr([=10=],start_point[val],end_point[val])
val=""
}
' Input_file2 Input_file1
说明:为以上代码添加说明。
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named Input_file2 is being read.
start_point[]= ##Creating an array named start_point with index of current line and its value is .
end_point[]= ##Creating an array named end_point with index of current line and its value is .
next ##next will skip all further statements from here.
}
/^>/{ ##Checking condition if a line starts from > then do following.
sub(/^>/,"") ##Substituting starting > with NULL.
val=[=11=] ##Creating a variable val whose value is [=11=].
next ##next will skip all further statements from here.
}
{
print val ORS substr([=11=],start_point[val],end_point[val]) ##Printing val newline(ORS) and sub-string of current line whose start value is value of start_point[val] and end point is value of end_point[val].
val="" ##Nullifying variable val here.
}
' Input_file2 Input_file1 ##Mentioning Input_file names here.
如果我没理解错的话,你能试试下面的方法吗:
awk '
FNR==NR {
name[NR] =
start[NR] =
len[NR] = -
count = NR
next
}
/^>/ {
sub(/^>/,"")
genotype=[=10=]
next
}
{
for (i = 1; i <= count; i++) {
print ">" name[i] > genotype
print substr([=10=], start[i], len[i]) >> genotype
}
close(genotype)
}' input_file2 input_file1
input_file1:
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>genotype2
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
>genotype3
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Input_file2:
gene1 10 20
gene2 40 50
gene3 20 25
[结果]
基因型 1:
>gene1
aaaaaaaaaa
>gene2
aaaaaaaaaa
>gene3
aaaaa
基因型 2:
>gene1
bbbbbbbbbb
>gene2
bbbbbbbbbb
>gene3
bbbbb
基因型 3:
>gene1
nnnnnnnnnn
>gene2
nnnnnnnnnn
>gene3
nnnnn
[编辑]
如果要将输出文件存储到其他目录,
请尝试以下操作:
dir="./outdir" # directory name to store the output files
# you can modify the name as you want
mkdir -p "$dir"
awk -v dir="$dir" '
FNR==NR {
name[NR] =
start[NR] =
len[NR] = -
count = NR
next
}
/^>/ {
sub(/^>/,"")
genotype=[=16=]
next
}
{
for (i = 1; i <= count; i++) {
print ">" name[i] > dir"/"genotype
print substr([=16=], start[i], len[i]) >> dir"/"genotype
}
close(dir"/"genotype)
}' input_file2 input_file1
- 前两行在 bash 中执行,用于定义和 mkdir 目标目录。
- 然后目录名通过
-v
选项传递给awk
希望对您有所帮助。
我写了一个 bash 脚本来从两个输入文件中提取子字符串并将其保存到输出文件中,如下所示: 输入文件 1
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
输入文件 2
gene1 10 20
gene2 40 50
genen x y
我的脚本
>output_file
cat input_file2 | while read row; do
echo $row > temp
geneName=`awk '{print }' temp`
startPos=`awk '{print }' temp`
endPos=`awk '{print }' temp`
length=$(expr $endPos - $startPos)
for i in temp; do
echo ">${geneName}" >> genes_fasta
awk -v S=$startPos -v L=$length '{print substr([=14=],S,L)}' input_file1 >> output file
done
done
我怎样才能使它在输入文件 1 中的多个字符串循环工作? 新的输入文件如下所示:
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>genotype2
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
>genotypen...
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn...
我希望每个基因型都有一个不同的输出文件,文件名就是基因型名称。 谢谢你!
您能否尝试以下操作,我假设您的 Input_file1 以 >
开头的列应该与 Input_file2 第一列的第一列进行比较(由于示例令人困惑,因此基于 OP 的尝试,已将其写入)。
awk '
FNR==NR{
start_point[]=
end_point[]=
next
}
/^>/{
sub(/^>/,"")
val=[=10=]
next
}
{
print val ORS substr([=10=],start_point[val],end_point[val])
val=""
}
' Input_file2 Input_file1
说明:为以上代码添加说明。
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named Input_file2 is being read.
start_point[]= ##Creating an array named start_point with index of current line and its value is .
end_point[]= ##Creating an array named end_point with index of current line and its value is .
next ##next will skip all further statements from here.
}
/^>/{ ##Checking condition if a line starts from > then do following.
sub(/^>/,"") ##Substituting starting > with NULL.
val=[=11=] ##Creating a variable val whose value is [=11=].
next ##next will skip all further statements from here.
}
{
print val ORS substr([=11=],start_point[val],end_point[val]) ##Printing val newline(ORS) and sub-string of current line whose start value is value of start_point[val] and end point is value of end_point[val].
val="" ##Nullifying variable val here.
}
' Input_file2 Input_file1 ##Mentioning Input_file names here.
如果我没理解错的话,你能试试下面的方法吗:
awk '
FNR==NR {
name[NR] =
start[NR] =
len[NR] = -
count = NR
next
}
/^>/ {
sub(/^>/,"")
genotype=[=10=]
next
}
{
for (i = 1; i <= count; i++) {
print ">" name[i] > genotype
print substr([=10=], start[i], len[i]) >> genotype
}
close(genotype)
}' input_file2 input_file1
input_file1:
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>genotype2
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
>genotype3
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Input_file2:
gene1 10 20
gene2 40 50
gene3 20 25
[结果]
基因型 1:
>gene1
aaaaaaaaaa
>gene2
aaaaaaaaaa
>gene3
aaaaa
基因型 2:
>gene1
bbbbbbbbbb
>gene2
bbbbbbbbbb
>gene3
bbbbb
基因型 3:
>gene1
nnnnnnnnnn
>gene2
nnnnnnnnnn
>gene3
nnnnn
[编辑]
如果要将输出文件存储到其他目录,
请尝试以下操作:
dir="./outdir" # directory name to store the output files
# you can modify the name as you want
mkdir -p "$dir"
awk -v dir="$dir" '
FNR==NR {
name[NR] =
start[NR] =
len[NR] = -
count = NR
next
}
/^>/ {
sub(/^>/,"")
genotype=[=16=]
next
}
{
for (i = 1; i <= count; i++) {
print ">" name[i] > dir"/"genotype
print substr([=16=], start[i], len[i]) >> dir"/"genotype
}
close(dir"/"genotype)
}' input_file2 input_file1
- 前两行在 bash 中执行,用于定义和 mkdir 目标目录。
- 然后目录名通过
-v
选项传递给awk
希望对您有所帮助。