从制表符分隔的文件中删除带有模式的字符
Remove characters with pattern from a tab-delimited file
我有保存文件,其格式如
NODE_1_length_59711_cov_84.026979_g0_i0_1
12.8
NODE_1_length_59711_cov_84.026979_g0_i0_2
18.9
NODE_2_length_59711_cov_84.026979_g0_i0_1
14.3
NODE_2_length_59711_cov_84.026979_g0_i0_2
16.1
NODE_165433_length_59711_cov_84.026979_g0_i0_1
29
我想删除从“1”开始到最后“_”的所有字符。这样我就可以从多个文件中得到这样的输出-
1_1
12.8
1_2
18.9
2_1
14.3
2_2
16.1
165433_1
29
使用 GNU awk:
awk -F "\t" '{ fld1=gensub(/(^NODE_)([[:digit:]]+)(.*)([[:digit:]]+$)/,"\2_\4","g",);OFS=IFS;print fld1"\t"}' file
解释:
awk -F "\t" '{ # Set the field separator to tab
fld1=gensub(/(^NODE_)([[:digit:]]+)(.*)([[:digit:]]+$)/,"\2_\4","g",); # Split the first field into 4 sections represented in parenthesis and then substitute the line for the the second section, a "_" and then the fourth section. Read the result into a variable fld1
print fld1"\t" # Print fld1, followed by a tab and then the second field.
}' file
see demo
echo 'NODE_165433_length_59711_cov_84.026979_g0_i0_1' | sed -E 's/^NODE_([0-9]+)_.*_([0-9]+)/_/'
输出:
165433_1
我有保存文件,其格式如
NODE_1_length_59711_cov_84.026979_g0_i0_1 | 12.8 |
NODE_1_length_59711_cov_84.026979_g0_i0_2 | 18.9 |
NODE_2_length_59711_cov_84.026979_g0_i0_1 | 14.3 |
NODE_2_length_59711_cov_84.026979_g0_i0_2 | 16.1 |
NODE_165433_length_59711_cov_84.026979_g0_i0_1 | 29 |
我想删除从“1”开始到最后“_”的所有字符。这样我就可以从多个文件中得到这样的输出-
1_1 | 12.8 |
1_2 | 18.9 |
2_1 | 14.3 |
2_2 | 16.1 |
165433_1 | 29 |
使用 GNU awk:
awk -F "\t" '{ fld1=gensub(/(^NODE_)([[:digit:]]+)(.*)([[:digit:]]+$)/,"\2_\4","g",);OFS=IFS;print fld1"\t"}' file
解释:
awk -F "\t" '{ # Set the field separator to tab
fld1=gensub(/(^NODE_)([[:digit:]]+)(.*)([[:digit:]]+$)/,"\2_\4","g",); # Split the first field into 4 sections represented in parenthesis and then substitute the line for the the second section, a "_" and then the fourth section. Read the result into a variable fld1
print fld1"\t" # Print fld1, followed by a tab and then the second field.
}' file
see demo
echo 'NODE_165433_length_59711_cov_84.026979_g0_i0_1' | sed -E 's/^NODE_([0-9]+)_.*_([0-9]+)/_/'
输出:
165433_1