正则表达式:忽略以这些字符开头的行
Regular expression: ignore lines begining with these characters
如何创建一个正则表达式来忽略以 "empty character"、# 或字母开头的行。下面是我拥有的数据示例,我需要匹配仅以数字开头的行(负数或正数):
0.000000 1.2712052472 0.8899021956 22.2458 265.2511402076 322.1539247218 -13.6281 -130.986 0.155342 0.889755 phaet_000227
0.000000 1.2712052462 0.8899021922 22.2458 265.2511430964 322.1539209801 -13.6281 -130.986 0.155342 0.889755 phaet_000090
0.000000 1.2712052476 0.8899022047 22.2458 265.2511396341 322.1539260295 -13.6281 -130.986 0.155342 0.889755 phaet_000111
0.000000 1.2712052465 0.8899022229 22.2458 265.2511497521 322.1539197205 -13.6281 -130.986 0.155342 0.889755 phaet_000059
Nplanets 9 Nparticles 500: alive 509/509 ejected 0 rmin 0 rmax 0
Full close app checks 0/0 (0.000000%) BS fails 0
Close apps 1 bounces 0 accretions 0 Max n/step 0
Simulation time 0 going to -100000.
Real time 1 s Force 0 s (0.00 %) Coll 0 s (0.00 %)
E&L 0 s (0.00 %) Kep 0 s (0.00 %)
CPU time 0.037627 s Force 0 s (0.00 %) Coll 0 s (0.00 %)
E&L 0 s (0.00 %) Kep 0 s (0.00 %)
# Nplanets 9 Nparticles 500: alive 509/509 ejected 0 rmin 0 rmax 0
# Full close app checks 0/0 (0.000000%) BS fails 0
# Close apps 1 bounces 0 accretions 0 Max n/step 0
# Simulation time 0 going to -100000.
# Real time 1 s Force 0 s (0.00 %) Coll 0 s (0.00 %)
# E&L 0 s (0.00 %) Kep 0 s (0.00 %)
# CPU time 0.037627 s Force 0 s (0.00 %) Coll 0 s (0.00 %)
# E&L 0 s (0.00 %) Kep 0 s (0.00 %)
Output step 1 at t=-10 going to -100000
-10.000000 1.2713031501 0.8900442847 22.1802 265.4033924020 322.0041354013 -5.32091 -102.357 0.155286 0.88482 phaet_000065
-10.000000 1.2713031508 0.8900443093 22.1802 265.4033954804 322.0041360861 -5.32091 -102.357 0.155286 0.88482 phaet_000299
-10.000000 1.2713031483 0.8900442977 22.1802 265.4033839221 322.0041469420 -5.32092 -102.357 0.155286 0.88482 phaet_000102
-10.000000 1.2713031486 0.8900442931 22.1802 265.4033724632 322.0041581369 -5.32092 -102.357 0.155286 0.884821 phaet_000371
-10.000000 1.2713031463 0.8900442910 22.1802 265.4033772870 322.0041532421 -5.32093 -102.357 0.155286 0.884821 phaet_000019
我想终于拥有:
0.000000 1.2712052472 0.8899021956 22.2458 265.2511402076 322.1539247218 -13.6281 -130.986 0.155342 0.889755 phaet_000227
0.000000 1.2712052462 0.8899021922 22.2458 265.2511430964 322.1539209801 -13.6281 -130.986 0.155342 0.889755 phaet_000090
0.000000 1.2712052476 0.8899022047 22.2458 265.2511396341 322.1539260295 -13.6281 -130.986 0.155342 0.889755 phaet_000111
0.000000 1.2712052465 0.8899022229 22.2458 265.2511497521 322.1539197205 -13.6281 -130.986 0.155342 0.889755 phaet_000059
-10.000000 1.2713031501 0.8900442847 22.1802 265.4033924020 322.0041354013 -5.32091 -102.357 0.155286 0.88482 phaet_000065
-10.000000 1.2713031508 0.8900443093 22.1802 265.4033954804 322.0041360861 -5.32091 -102.357 0.155286 0.88482 phaet_000299
-10.000000 1.2713031483 0.8900442977 22.1802 265.4033839221 322.0041469420 -5.32092 -102.357 0.155286 0.88482 phaet_000102
-10.000000 1.2713031486 0.8900442931 22.1802 265.4033724632 322.0041581369 -5.32092 -102.357 0.155286 0.884821 phaet_000371
-10.000000 1.2713031463 0.8900442910 22.1802 265.4033772870 322.0041532421 -5.32093 -102.357 0.155286 0.884821 phaet_000019
所以,我尝试 "grep" 如下:
grep -v '^[a-z,A-Z,\s,\#]' file1.dat > file2.dat
它去掉了以字母和“#”开头的行,但以白色 space 开头的行仍然存在,即我无法删除:
E&L 0 s (0.00 %) Kep 0 s (0.00 %)
E&L 0 s (0.00 %) Kep 0 s (0.00 %)
注意"E&L"前有白色space。
知道如何摆脱这些吗?
由于前导空格,这两行不会消除。
你可以先消除它们。
sed "s/^[ \t]*//" file1.dat > file3.dat
然后使用'grep'命令过滤文件。
grep -v '^[a-z,A-Z,\s,\#]' file3.dat > file2.dat
在 grep 中,[\s,\#]
匹配反斜杠、逗号或井号。 (反斜杠在方括号表达式中没有特殊含义,逗号也从不特殊。)匹配空格的最简单方法是使用 [:space:]
字符 class。所以你的正则表达式是:
^[a-zA-Z#[:space:]]
您还可以积极搜索 以数字开头的行:
^-\?[[:digit:]]\+\.[[:digit:]]\+
如何创建一个正则表达式来忽略以 "empty character"、# 或字母开头的行。下面是我拥有的数据示例,我需要匹配仅以数字开头的行(负数或正数):
0.000000 1.2712052472 0.8899021956 22.2458 265.2511402076 322.1539247218 -13.6281 -130.986 0.155342 0.889755 phaet_000227
0.000000 1.2712052462 0.8899021922 22.2458 265.2511430964 322.1539209801 -13.6281 -130.986 0.155342 0.889755 phaet_000090
0.000000 1.2712052476 0.8899022047 22.2458 265.2511396341 322.1539260295 -13.6281 -130.986 0.155342 0.889755 phaet_000111
0.000000 1.2712052465 0.8899022229 22.2458 265.2511497521 322.1539197205 -13.6281 -130.986 0.155342 0.889755 phaet_000059
Nplanets 9 Nparticles 500: alive 509/509 ejected 0 rmin 0 rmax 0
Full close app checks 0/0 (0.000000%) BS fails 0
Close apps 1 bounces 0 accretions 0 Max n/step 0
Simulation time 0 going to -100000.
Real time 1 s Force 0 s (0.00 %) Coll 0 s (0.00 %)
E&L 0 s (0.00 %) Kep 0 s (0.00 %)
CPU time 0.037627 s Force 0 s (0.00 %) Coll 0 s (0.00 %)
E&L 0 s (0.00 %) Kep 0 s (0.00 %)
# Nplanets 9 Nparticles 500: alive 509/509 ejected 0 rmin 0 rmax 0
# Full close app checks 0/0 (0.000000%) BS fails 0
# Close apps 1 bounces 0 accretions 0 Max n/step 0
# Simulation time 0 going to -100000.
# Real time 1 s Force 0 s (0.00 %) Coll 0 s (0.00 %)
# E&L 0 s (0.00 %) Kep 0 s (0.00 %)
# CPU time 0.037627 s Force 0 s (0.00 %) Coll 0 s (0.00 %)
# E&L 0 s (0.00 %) Kep 0 s (0.00 %)
Output step 1 at t=-10 going to -100000
-10.000000 1.2713031501 0.8900442847 22.1802 265.4033924020 322.0041354013 -5.32091 -102.357 0.155286 0.88482 phaet_000065
-10.000000 1.2713031508 0.8900443093 22.1802 265.4033954804 322.0041360861 -5.32091 -102.357 0.155286 0.88482 phaet_000299
-10.000000 1.2713031483 0.8900442977 22.1802 265.4033839221 322.0041469420 -5.32092 -102.357 0.155286 0.88482 phaet_000102
-10.000000 1.2713031486 0.8900442931 22.1802 265.4033724632 322.0041581369 -5.32092 -102.357 0.155286 0.884821 phaet_000371
-10.000000 1.2713031463 0.8900442910 22.1802 265.4033772870 322.0041532421 -5.32093 -102.357 0.155286 0.884821 phaet_000019
我想终于拥有:
0.000000 1.2712052472 0.8899021956 22.2458 265.2511402076 322.1539247218 -13.6281 -130.986 0.155342 0.889755 phaet_000227
0.000000 1.2712052462 0.8899021922 22.2458 265.2511430964 322.1539209801 -13.6281 -130.986 0.155342 0.889755 phaet_000090
0.000000 1.2712052476 0.8899022047 22.2458 265.2511396341 322.1539260295 -13.6281 -130.986 0.155342 0.889755 phaet_000111
0.000000 1.2712052465 0.8899022229 22.2458 265.2511497521 322.1539197205 -13.6281 -130.986 0.155342 0.889755 phaet_000059
-10.000000 1.2713031501 0.8900442847 22.1802 265.4033924020 322.0041354013 -5.32091 -102.357 0.155286 0.88482 phaet_000065
-10.000000 1.2713031508 0.8900443093 22.1802 265.4033954804 322.0041360861 -5.32091 -102.357 0.155286 0.88482 phaet_000299
-10.000000 1.2713031483 0.8900442977 22.1802 265.4033839221 322.0041469420 -5.32092 -102.357 0.155286 0.88482 phaet_000102
-10.000000 1.2713031486 0.8900442931 22.1802 265.4033724632 322.0041581369 -5.32092 -102.357 0.155286 0.884821 phaet_000371
-10.000000 1.2713031463 0.8900442910 22.1802 265.4033772870 322.0041532421 -5.32093 -102.357 0.155286 0.884821 phaet_000019
所以,我尝试 "grep" 如下:
grep -v '^[a-z,A-Z,\s,\#]' file1.dat > file2.dat
它去掉了以字母和“#”开头的行,但以白色 space 开头的行仍然存在,即我无法删除:
E&L 0 s (0.00 %) Kep 0 s (0.00 %)
E&L 0 s (0.00 %) Kep 0 s (0.00 %)
注意"E&L"前有白色space。
知道如何摆脱这些吗?
由于前导空格,这两行不会消除。 你可以先消除它们。
sed "s/^[ \t]*//" file1.dat > file3.dat
然后使用'grep'命令过滤文件。
grep -v '^[a-z,A-Z,\s,\#]' file3.dat > file2.dat
在 grep 中,[\s,\#]
匹配反斜杠、逗号或井号。 (反斜杠在方括号表达式中没有特殊含义,逗号也从不特殊。)匹配空格的最简单方法是使用 [:space:]
字符 class。所以你的正则表达式是:
^[a-zA-Z#[:space:]]
您还可以积极搜索 以数字开头的行:
^-\?[[:digit:]]\+\.[[:digit:]]\+