AWK:如何在不考虑回行的情况下提取两个“\\”之间的文本块

AWK: how to extract in a block of text between two "\\" without considering the back to line

我想从一大块文本中提取特定区域 通过将字段分隔符设置为“\\”,但是我总是遇到一个问题,因为我的文本包含一些单个的“\”,它似乎会干扰正确的文本提取

输入:

1\GINC-R1430\FOpt\RB3LYP-31G(d,p)\C11H8\ROOT-Jan-2015[=10=]\#N b3l
 yp/6-31G** opt freq=noraman test Maxdisk=1Gb\3\0,1\C,-2.6997011275,0
 .2415237678,0.5867242856\C,-0.844160292,1.6395735777,-0.4268479833\C,-
 1.9760161741,1.2551936894,0.1361541401\C,-2.3923087914,-1.0358860734,-
 0.0557643955\C,0.3235980425,0.7875682734,-0.1356859882\C,-1.1093142432
 ,-1.3685423936,-0.3602591004\C,0.1496925203,-0.6332454104,-0.151244509
 2\H,-3.3806331312,0.2996137801,1.4332335206\H,-0.7633170455,2.45988827
 32,-1.1373018124\H,1.7187287121,2.4104501712,0.0387394407\H,-3.1756548
 236,-1.7742599934,-0.224548871\H,-0.9560852099,-2.3752668104,-0.747558
 6451\C,1.6076580336,1.3296735593,0.0442342156\C,2.5669578833,-0.875832
 9525,0.1864536297\H,3.4305876714,-1.5230597241,0.3068386649\C,1.309289
 0866,-1.4290100931,-0.0026907826\H,1.2013201753,-2.5103156986,-0.02627
 39389\C,2.7201916294,0.5158561201,0.2083031485\H,3.7045180838,0.956653
 9373,0.3361669809\Version=ES64L-G09RevD.01\State=1-A\HF=-423.9087698\
 RMSD=8.508e-09\RMSF=5.945e-05\Dipole=0.3132737,-0.297812,-0.0202519\Qu
 adrupole=2.0644665,1.7222772,-3.7867437,1.9108337,-0.4477432,-0.303338
 1\PG=C01 [X(C11H8)]\@

我正在寻找的输出:

0,1\C,-2.6997011275,0
 .2415237678,0.5867242856\C,-0.844160292,1.6395735777,-0.4268479833\C,-
 1.9760161741,1.2551936894,0.1361541401\C,-2.3923087914,-1.0358860734,-
 0.0557643955\C,0.3235980425,0.7875682734,-0.1356859882\C,-1.1093142432
 ,-1.3685423936,-0.3602591004\C,0.1496925203,-0.6332454104,-0.151244509
 2\H,-3.3806331312,0.2996137801,1.4332335206\H,-0.7633170455,2.45988827
 32,-1.1373018124\H,1.7187287121,2.4104501712,0.0387394407\H,-3.1756548
 236,-1.7742599934,-0.224548871\H,-0.9560852099,-2.3752668104,-0.747558
 6451\C,1.6076580336,1.3296735593,0.0442342156\C,2.5669578833,-0.875832
 9525,0.1864536297\H,3.4305876714,-1.5230597241,0.3068386649\C,1.309289
 0866,-1.4290100931,-0.0026907826\H,1.2013201753,-2.5103156986,-0.02627
 39389\C,2.7201916294,0.5158561201,0.2083031485\H,3.7045180838,0.956653
 9373,0.3361669809

到目前为止我得到的最好的结果是使用一个简单的:

awk 'BEGIN { FS = "\\" } ; {print $SELECTED AREA}'

如果可以在不考虑“\”的情况下将字段分隔符设置为“\\”,则所选区域将为 $4

有人知道怎么做吗?

你需要全部八个反斜杠才能得到你想要的。

awk -F '\\\\' '{print }'

那是因为您将它们加倍以获得字符串中的文字反斜杠,然后再将它们加倍以获得正则表达式中的文字反斜杠。

顺便说一句,这是一个非常糟糕的字段分隔符选择。

要获得正确的输出,您需要像这样将记录分隔符设置为空:

awk -F'\\\\' '{print }' RS= file
0,1\C,-2.6997011275,0
 .2415237678,0.5867242856\C,-0.844160292,1.6395735777,-0.4268479833\C,-
 1.9760161741,1.2551936894,0.1361541401\C,-2.3923087914,-1.0358860734,-
 0.0557643955\C,0.3235980425,0.7875682734,-0.1356859882\C,-1.1093142432
 ,-1.3685423936,-0.3602591004\C,0.1496925203,-0.6332454104,-0.151244509
 2\H,-3.3806331312,0.2996137801,1.4332335206\H,-0.7633170455,2.45988827
 32,-1.1373018124\H,1.7187287121,2.4104501712,0.0387394407\H,-3.1756548
 236,-1.7742599934,-0.224548871\H,-0.9560852099,-2.3752668104,-0.747558
 6451\C,1.6076580336,1.3296735593,0.0442342156\C,2.5669578833,-0.875832
 9525,0.1864536297\H,3.4305876714,-1.5230597241,0.3068386649\C,1.309289
 0866,-1.4290100931,-0.0026907826\H,1.2013201753,-2.5103156986,-0.02627
 39389\C,2.7201916294,0.5158561201,0.2083031485\H,3.7045180838,0.956653
 9373,0.3361669809

您可能需要 gnu awk 将记录选择器设置为空。

好的,感谢 ED Morton、Jotne 和 tripleee,我明白了 通过设置 RS,我现在可以使用

获得正确的输出
awk 'BEGIN {FS="\\\\"; RS="\n\n";} {print }'

因为我没有任何双空行,所以它现在将我的文本块视为一个区域。 我以前从未考虑过 RS,因为我通常主要从事 table 解析。 谢谢