以一定条件替换某列的正负数
replacing the positive and negative number of a column with certain condition
我有一个包含四列数字的文本文件。
我想根据以下规则保留或修改第 4 列:
- 如果在
-2.100552742679913983e-02
到-1.249582196275393240e-02
范围内 - no change
- 如果在
1.381056887718538353e-04
到2.346095085764924595e-04
范围内 - no change
- 如果为负 - 在
-1.8445493471994996071e-03
到-1.145493471994996071e-03
范围内生成一个新的随机数
- if positive - 在
1.531056887718538353e-06
到 1.956056887718538353e-06
范围内生成一个新的随机数
示例输入:
$ cat input
> > >
2.60000038 2.99699998 -0.00000000 -1.249582196275393240e-02
2.70000076 2.99699998 -0.00000000 -2.296202816069126129e-02
2.80000114 2.99699998 -0.00000000 -2.527230263998111234e-02
2.89999962 2.99699998 -0.00000000 -2.100552742679913983e-02
> > >
2.89999962 2.99699998 -0.00000000 -2.150552742679913983e-01
2.89999962 2.99699998 -5.00000000 -2.190552742679913983e-01
2.89999962 2.99699998 -900000000 -2.190552742679913983e-03
> > >
0.500000000 2.99699998 -1.14950405E-09 1.381056887718538353e-04
0.600000381 2.99699998 -1.66670497E-10 2.346095085764924595e-04
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-04
0.800000763 2.99699998 -9.37441375E-11 1.126244050537546566e-04
0.700000763 2.99699998 -9.37441375E-11 1.136244050537546566e-04
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-03
> > >
负数替换示例:
# from
2.89999962 2.99699998 -0.00000000 -2.150552742679913983e-01
2.89999962 2.99699998 -5.00000000 -2.190552742679913983e-01
2.89999962 2.99699998 -900000000 -2.190552742679913983e-03
# to
2.89999962 2.99699998 -0.00000000 -1.149552522674912181e-03
2.89999962 2.99699998 -5.00000000 -1.141552612675913281e-03
2.89999962 2.99699998 -900000000 -1.346552142676911382e-03
正数替换示例:
# from
0.800000763 2.99699998 -9.37441375E-11 1.126244050537546566e-04
0.700000763 2.99699998 -9.37441375E-11 1.136244050537546566e-04
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-03
# to
0.800000763 2.99699998 -9.37441375E-11 1.561056887718538353e-06
0.700000763 2.99699998 -9.37441375E-11 1.621056887718538353e-06
0.700000763 2.99699998 -9.37441375E-11 1.506244050537546566e-06
整组输入的预期结果:
> > >
2.60000038 2.99699998 -0.00000000 -1.249582196275393240e-02 # no change
2.70000076 2.99699998 -0.00000000 -2.296202816069126129e-02 # no change
2.80000114 2.99699998 -0.00000000 -2.527230263998111234e-02 # no change
2.89999962 2.99699998 -0.00000000 -2.100552742679913983e-02 # no change
> > >
2.89999962 2.99699998 -0.00000000 -1.149552522674912181e-03 # new value
2.89999962 2.99699998 -5.00000000 -1.141552612675913281e-03 # new value
2.89999962 2.99699998 -900000000 -1.346552142676911382e-03 # new value
> > >
0.500000000 2.99699998 -1.14950405E-09 1.381056887718538353e-04 # no change
0.600000381 2.99699998 -1.66670497E-10 2.346095085764924595e-04 # no change
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-04 # no change
0.800000763 2.99699998 -9.37441375E-11 1.561056887718538353e-06 # new value
0.700000763 2.99699998 -9.37441375E-11 1.621056887718538353e-06 # new value
0.700000763 2.99699998 -9.37441375E-11 1.506244050537546566e-06 # new value
> > >
注意: 为清楚起见添加了评论
我试过了:
awk '( >=-1.249582196275393240e-02 && <= -2.100552742679913983e-02){print ,,,}' input
awk '( >=1.381056887718538353e-04 && <= 2.346095085764924595e-04){print ,,,}' input
但是,我不知道如何实现生成新随机数的规则。希望高手帮帮我。提前致谢。
对于测试范围内负值的OP当前代码,需要切换测试值(即-2.100552742679913983e-02
小于-1.249582196275393240e-02
。
我们可以将所有测试和修改合并到一个 awk
脚本中:
awk -v seed="$RANDOM" ' # if available $SRANDOM should ensure more randomness
BEGIN { srand(seed) # initiate awk random number generator
# test ranges
test_neg_min = -2.100552742679913983e-02
test_neg_max = -1.249582196275393240e-02
test_pos_min = 1.381056887718538353e-04
test_pos_max = 2.346095085764924595e-04
# new ranges
new_neg_min = -1.8445493471994996071e-03
new_neg_max = -1.145493471994996071e-03
new_pos_min = 1.531056887718538353e-06
new_pos_max = 1.956056887718538353e-06
}
!/^>/ { # if line does not start with ">" the ...
# if field #4 is negative and outside our test range, replace field #4 with a new negative random number
if ( < 0 && ! ( >= test_neg_min && <= test_neg_max ) )
= sprintf("%.18e", ( new_neg_min + rand() * ( new_neg_max - new_neg_min ) ) )
# if field #4 is positive and outside our test range, replace field #4 with a new positive random number
else
if ( >= 0 && ! ( >= test_pos_min && <= test_pos_max ) )
= sprintf("%.18e", ( new_pos_min + rand() * ( new_pos_max - new_pos_min ) ) )
}
1 # print current line
' input
这会生成:
> > >
2.60000038 2.99699998 -0.00000000 -1.249582196275393240e-02
2.70000076 2.99699998 -0.00000000 -1.161336798144420573e-03
2.80000114 2.99699998 -0.00000000 -1.574493366149551160e-03
2.89999962 2.99699998 -0.00000000 -2.100552742679913983e-02
> > >
2.89999962 2.99699998 -0.00000000 -1.482455058873062602e-03
2.89999962 2.99699998 -5.00000000 -1.534007533960116132e-03
2.89999962 2.99699998 -900000000 -1.324348394828239262e-03
> > >
0.500000000 2.99699998 -1.14950405E-09 1.381056887718538353e-04
0.600000381 2.99699998 -1.66670497E-10 2.346095085764924595e-04
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-04
0.800000763 2.99699998 -9.37441375E-11 1.947523358949625806e-06
0.700000763 2.99699998 -9.37441375E-11 1.612522816477175151e-06
0.700000763 2.99699998 -9.37441375E-11 1.780935388104584015e-06
> > >
我认为您使用的数字太小,awk 无法可靠地比较它们,除非您使用 MPFR 构建的 GNU awk Arbitrary Precision Arithemetic. For example, you require awk to be able to test if a number like -0.01249582196275393241
is less than or equal to -0.01249582196275393240
- that's a comparison to 20 decimal points when I'm pretty sure the max precision that can be stored without MPFR is 17 and above about 15 you're already losing accuracy (e.g. see https://unix.stackexchange.com/a/568750/133219)。
对于 Arbitrary Precision Arithemetic,您需要使用 GNU awk 进行如下操作:
$ cat tst.awk
BEGIN {
PREC = 50
CONVFMT = OFMT = "%.18e"
neg_acc_beg = -2.100552742679913983e-02
neg_acc_end = -1.249582196275393240e-02
pos_acc_beg = 1.381056887718538353e-04
pos_acc_end = 2.346095085764924595e-04
neg_chg_min = -1.8445493471994996071e-03
neg_chg_max = -1.145493471994996071e-03
pos_chg_min = 1.531056887718538353e-06
pos_chg_max = 1.956056887718538353e-06
if (seed == "") { srand() }
else { srand(seed) }
}
!( /^>/ ||
((neg_acc_beg <= ) && ( <= neg_acc_end)) ||
((pos_acc_beg <= ) && ( <= pos_acc_end)) \
) {
if ( < 0 ) {
min = neg_chg_min
max = neg_chg_max
}
else {
min = pos_chg_min
max = pos_chg_max
}
= min + rand()*(max-min) " # random"
}
{ print }
$ awk -M -f tst.awk file
> > >
2.60000038 2.99699998 -0.00000000 -1.249582196275393240e-02
2.70000076 2.99699998 -0.00000000 -1.745496810369265642e-03 # random
2.80000114 2.99699998 -0.00000000 -1.573904107632305080e-03 # random
2.89999962 2.99699998 -0.00000000 -1.598592869722831322e-03 # random
> > >
2.89999962 2.99699998 -0.00000000 -1.800124621621710871e-03 # random
2.89999962 2.99699998 -5.00000000 -1.699769735752932817e-03 # random
2.89999962 2.99699998 -900000000 -1.342199391079360385e-03 # random
> > >
0.500000000 2.99699998 -1.14950405E-09 1.381056887718538353e-04
0.600000381 2.99699998 -1.66670497E-10 1.588483317773829457e-06 # random
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-04
0.800000763 2.99699998 -9.37441375E-11 1.891636587004151809e-06 # random
0.700000763 2.99699998 -9.37441375E-11 1.826667420598640320e-06 # random
0.700000763 2.99699998 -9.37441375E-11 1.616694144436782760e-06 # random
> > >
“#random”文本显然只是为了突出显示已更改的行,完成测试后将其删除。
它没有在你的问题中产生预期的输出,因为我不理解你问题中的预期输出,所以你可能需要稍微调试一下,但它向你展示了这样一个脚本的想法和结构。特别是在我引用的 gawk 手册中阅读任意精度算术以及如何设置 PREC,因为它不是我经常使用的东西,我不确定 50 是否是一个好的值。
有关使用 rand()
和 srand()
生成随机数的更多信息,请参阅 https://www.gnu.org/software/gawk/manual/gawk.html#Numeric-Functions。
我有一个包含四列数字的文本文件。
我想根据以下规则保留或修改第 4 列:
- 如果在
-2.100552742679913983e-02
到-1.249582196275393240e-02
范围内 -no change
- 如果在
1.381056887718538353e-04
到2.346095085764924595e-04
范围内 -no change
- 如果为负 - 在
-1.8445493471994996071e-03
到-1.145493471994996071e-03
范围内生成一个新的随机数
- if positive - 在
1.531056887718538353e-06
到1.956056887718538353e-06
范围内生成一个新的随机数
示例输入:
$ cat input
> > >
2.60000038 2.99699998 -0.00000000 -1.249582196275393240e-02
2.70000076 2.99699998 -0.00000000 -2.296202816069126129e-02
2.80000114 2.99699998 -0.00000000 -2.527230263998111234e-02
2.89999962 2.99699998 -0.00000000 -2.100552742679913983e-02
> > >
2.89999962 2.99699998 -0.00000000 -2.150552742679913983e-01
2.89999962 2.99699998 -5.00000000 -2.190552742679913983e-01
2.89999962 2.99699998 -900000000 -2.190552742679913983e-03
> > >
0.500000000 2.99699998 -1.14950405E-09 1.381056887718538353e-04
0.600000381 2.99699998 -1.66670497E-10 2.346095085764924595e-04
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-04
0.800000763 2.99699998 -9.37441375E-11 1.126244050537546566e-04
0.700000763 2.99699998 -9.37441375E-11 1.136244050537546566e-04
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-03
> > >
负数替换示例:
# from
2.89999962 2.99699998 -0.00000000 -2.150552742679913983e-01
2.89999962 2.99699998 -5.00000000 -2.190552742679913983e-01
2.89999962 2.99699998 -900000000 -2.190552742679913983e-03
# to
2.89999962 2.99699998 -0.00000000 -1.149552522674912181e-03
2.89999962 2.99699998 -5.00000000 -1.141552612675913281e-03
2.89999962 2.99699998 -900000000 -1.346552142676911382e-03
正数替换示例:
# from
0.800000763 2.99699998 -9.37441375E-11 1.126244050537546566e-04
0.700000763 2.99699998 -9.37441375E-11 1.136244050537546566e-04
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-03
# to
0.800000763 2.99699998 -9.37441375E-11 1.561056887718538353e-06
0.700000763 2.99699998 -9.37441375E-11 1.621056887718538353e-06
0.700000763 2.99699998 -9.37441375E-11 1.506244050537546566e-06
整组输入的预期结果:
> > >
2.60000038 2.99699998 -0.00000000 -1.249582196275393240e-02 # no change
2.70000076 2.99699998 -0.00000000 -2.296202816069126129e-02 # no change
2.80000114 2.99699998 -0.00000000 -2.527230263998111234e-02 # no change
2.89999962 2.99699998 -0.00000000 -2.100552742679913983e-02 # no change
> > >
2.89999962 2.99699998 -0.00000000 -1.149552522674912181e-03 # new value
2.89999962 2.99699998 -5.00000000 -1.141552612675913281e-03 # new value
2.89999962 2.99699998 -900000000 -1.346552142676911382e-03 # new value
> > >
0.500000000 2.99699998 -1.14950405E-09 1.381056887718538353e-04 # no change
0.600000381 2.99699998 -1.66670497E-10 2.346095085764924595e-04 # no change
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-04 # no change
0.800000763 2.99699998 -9.37441375E-11 1.561056887718538353e-06 # new value
0.700000763 2.99699998 -9.37441375E-11 1.621056887718538353e-06 # new value
0.700000763 2.99699998 -9.37441375E-11 1.506244050537546566e-06 # new value
> > >
注意: 为清楚起见添加了评论
我试过了:
awk '( >=-1.249582196275393240e-02 && <= -2.100552742679913983e-02){print ,,,}' input
awk '( >=1.381056887718538353e-04 && <= 2.346095085764924595e-04){print ,,,}' input
但是,我不知道如何实现生成新随机数的规则。希望高手帮帮我。提前致谢。
对于测试范围内负值的OP当前代码,需要切换测试值(即-2.100552742679913983e-02
小于-1.249582196275393240e-02
。
我们可以将所有测试和修改合并到一个 awk
脚本中:
awk -v seed="$RANDOM" ' # if available $SRANDOM should ensure more randomness
BEGIN { srand(seed) # initiate awk random number generator
# test ranges
test_neg_min = -2.100552742679913983e-02
test_neg_max = -1.249582196275393240e-02
test_pos_min = 1.381056887718538353e-04
test_pos_max = 2.346095085764924595e-04
# new ranges
new_neg_min = -1.8445493471994996071e-03
new_neg_max = -1.145493471994996071e-03
new_pos_min = 1.531056887718538353e-06
new_pos_max = 1.956056887718538353e-06
}
!/^>/ { # if line does not start with ">" the ...
# if field #4 is negative and outside our test range, replace field #4 with a new negative random number
if ( < 0 && ! ( >= test_neg_min && <= test_neg_max ) )
= sprintf("%.18e", ( new_neg_min + rand() * ( new_neg_max - new_neg_min ) ) )
# if field #4 is positive and outside our test range, replace field #4 with a new positive random number
else
if ( >= 0 && ! ( >= test_pos_min && <= test_pos_max ) )
= sprintf("%.18e", ( new_pos_min + rand() * ( new_pos_max - new_pos_min ) ) )
}
1 # print current line
' input
这会生成:
> > >
2.60000038 2.99699998 -0.00000000 -1.249582196275393240e-02
2.70000076 2.99699998 -0.00000000 -1.161336798144420573e-03
2.80000114 2.99699998 -0.00000000 -1.574493366149551160e-03
2.89999962 2.99699998 -0.00000000 -2.100552742679913983e-02
> > >
2.89999962 2.99699998 -0.00000000 -1.482455058873062602e-03
2.89999962 2.99699998 -5.00000000 -1.534007533960116132e-03
2.89999962 2.99699998 -900000000 -1.324348394828239262e-03
> > >
0.500000000 2.99699998 -1.14950405E-09 1.381056887718538353e-04
0.600000381 2.99699998 -1.66670497E-10 2.346095085764924595e-04
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-04
0.800000763 2.99699998 -9.37441375E-11 1.947523358949625806e-06
0.700000763 2.99699998 -9.37441375E-11 1.612522816477175151e-06
0.700000763 2.99699998 -9.37441375E-11 1.780935388104584015e-06
> > >
我认为您使用的数字太小,awk 无法可靠地比较它们,除非您使用 MPFR 构建的 GNU awk Arbitrary Precision Arithemetic. For example, you require awk to be able to test if a number like -0.01249582196275393241
is less than or equal to -0.01249582196275393240
- that's a comparison to 20 decimal points when I'm pretty sure the max precision that can be stored without MPFR is 17 and above about 15 you're already losing accuracy (e.g. see https://unix.stackexchange.com/a/568750/133219)。
对于 Arbitrary Precision Arithemetic,您需要使用 GNU awk 进行如下操作:
$ cat tst.awk
BEGIN {
PREC = 50
CONVFMT = OFMT = "%.18e"
neg_acc_beg = -2.100552742679913983e-02
neg_acc_end = -1.249582196275393240e-02
pos_acc_beg = 1.381056887718538353e-04
pos_acc_end = 2.346095085764924595e-04
neg_chg_min = -1.8445493471994996071e-03
neg_chg_max = -1.145493471994996071e-03
pos_chg_min = 1.531056887718538353e-06
pos_chg_max = 1.956056887718538353e-06
if (seed == "") { srand() }
else { srand(seed) }
}
!( /^>/ ||
((neg_acc_beg <= ) && ( <= neg_acc_end)) ||
((pos_acc_beg <= ) && ( <= pos_acc_end)) \
) {
if ( < 0 ) {
min = neg_chg_min
max = neg_chg_max
}
else {
min = pos_chg_min
max = pos_chg_max
}
= min + rand()*(max-min) " # random"
}
{ print }
$ awk -M -f tst.awk file
> > >
2.60000038 2.99699998 -0.00000000 -1.249582196275393240e-02
2.70000076 2.99699998 -0.00000000 -1.745496810369265642e-03 # random
2.80000114 2.99699998 -0.00000000 -1.573904107632305080e-03 # random
2.89999962 2.99699998 -0.00000000 -1.598592869722831322e-03 # random
> > >
2.89999962 2.99699998 -0.00000000 -1.800124621621710871e-03 # random
2.89999962 2.99699998 -5.00000000 -1.699769735752932817e-03 # random
2.89999962 2.99699998 -900000000 -1.342199391079360385e-03 # random
> > >
0.500000000 2.99699998 -1.14950405E-09 1.381056887718538353e-04
0.600000381 2.99699998 -1.66670497E-10 1.588483317773829457e-06 # random
0.700000763 2.99699998 -9.37441375E-11 2.136244050537546566e-04
0.800000763 2.99699998 -9.37441375E-11 1.891636587004151809e-06 # random
0.700000763 2.99699998 -9.37441375E-11 1.826667420598640320e-06 # random
0.700000763 2.99699998 -9.37441375E-11 1.616694144436782760e-06 # random
> > >
“#random”文本显然只是为了突出显示已更改的行,完成测试后将其删除。
它没有在你的问题中产生预期的输出,因为我不理解你问题中的预期输出,所以你可能需要稍微调试一下,但它向你展示了这样一个脚本的想法和结构。特别是在我引用的 gawk 手册中阅读任意精度算术以及如何设置 PREC,因为它不是我经常使用的东西,我不确定 50 是否是一个好的值。
有关使用 rand()
和 srand()
生成随机数的更多信息,请参阅 https://www.gnu.org/software/gawk/manual/gawk.html#Numeric-Functions。