使用 grep/awk 从文件中提取信息

Question

我有一个名为 info.txt 的文件，其中包含如下信息：

=== Some Unique Headline ===
Version: 1.2.0

== Changelog ==

= 1.2.0 =
* Mac release
* Security fix

= 1.0.0 =
* Windows release

我想提取 2 个部分。

Version号码，1.2.0我可以用这个：

grep 'Version:' info.txt | cut -d\   -f3

我想提取与该版本匹配的更新日志标题下的信息。例如，由于 Version 是 1.2.0，将提取以下文本（更改日志中标题 1.2.0 下的文本）：

* Mac release
* Security fix

有没有办法用 grep 或 awk 做到这一点？

Answer 1

您可以在 awk 中构建一个简单的状态机以 select 相关行：

        /^=== /        { s = 3; next }
s==3 && =="Version:" { v =  }
        /^== /         { s = 2; d = (=="Changelog") }
   d && /^= /          { s = 1; p = (==v); next }
   p && [=10=]             { print }

s 存储状态：
- s==3 : 部分
- s==2 : 小节
- s==1 : 小节（不是真的需要）
d 存储另一个状态 - 在变更日志部分
p 存储另一个状态 - 当可打印行
v 保留找到后的版本
p && [=18=] : 输出非空可打印行

存储在文件中（比如“脚本”）并调用为 awk -f script info.txt

Answer 2

这是一个没有使用 grep 和 awk 的例子。在 bash.

中有一种使用正则表达式匹配的方法

#!/bin/bash
while read line
do
  if [[ $line =~ ^Version:\ (.+)$ ]]; then
    ver=${BASH_REMATCH[1]}
  elif [[ $line =~ ^=\ ([0-9\.]+)\ = ]]; then
    cur=${BASH_REMATCH[1]}
  elif [[ "$cur" != "" && "$ver" = "$cur" ]]; then
    echo "$line"
  fi
done < info.txt

Answer 3

使用您显示的示例，请尝试以下 awk 代码。

awk -v RS="" '
/Version:/{
  ver=$NF
  print $NF
  next
}
/Changelog/{
  found1=1
  next
}
found1 && [=10=] ~ ver{
  sub(/^[^\n]*\n*/,"")
  print
  found1=""
}
'  Input_file

Answer 4

awk 处于段落模式：

awk -v RS= -v ORS='\n\n' -v FS='\n' -v OFS='\n' '/Version/ {split([=10=],a,/:/);vn = a[2]}  ~ vn {print ,}' file
* Mac release
* Security fix

更具可读性：

awk -v RS= -v ORS='\n\n' -v FS='\n' -v OFS='\n' '
        /Version/ {split([=11=],a,/:/);vn = a[2]} 
         ~ vn {print ,}
' file
* Mac release
* Security fix

vn版本号。您可以使用 split() 函数获取它。
如果该行包含 vn 则仅从该段落打印 $2 和 $3： ~ vn {print ,}.

Answer 5

# get the version, maybe using `f2` instead of `f3` in your question's command
ver=`grep 'Version:' info.txt | cut -d\  -f2`

# get the heading that matches that version
ver_line=`grep -v 'Version:' info.txt |grep "$ver"`

# grep the hole file by `-z` which treat all lines as one big line
# and format the output to remove the can visibility character
grep -Pzo '(?<='"$ver_line"')[^=]*(?==)' info.txt | cat -v | grep -Ev "^$|^\^"

grep 选项引用：

-P, --perl-regexp Interpret the pattern as a Perl-compatible regular expression (PCRE).

-z, --null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline. Like the -Z or --null option, this option can be used with commands like sort -z to process arbitrary file names.

你可以得到这个，因为 -z 选项会在最后带来 the ASCII NUL character。

# grep -Pzo '(?<='"$ver_line"')[^=]*(?==)' info.txt | cat -v

* Mac release
* Security fix

^@

Answer 6

使用awk

cat test.awk
#!/bin/awk -f

/Version/{              # If match for Version is found
        print         # Print column 2 
        next
}
/Changelog/{            # If match for Changelog is found
        getline         # Skip a line
        getline         # Skip another line
        nextL=NR + 2    # Match the following two lines after the previous getline
        next
}
NR <= nextL             # Print NR from match Changelog till the line number reaches nextL

作为一条线，它看起来像：

awk ' /Version/ {print ; next} /Changelog/ {getline; getline; nextL=NR+2; next} NR <= nextL ' $file

这将打印输出如下：

awk -f test.awk $file
1.2.0
* Mac release
* Security fix

Answer 7

按照您的要求执行：

$ awk '=="Version:"{ver="= "" ="} !NF{f=0} f; [=10=]==ver{f=1}' file
* Mac release
* Security fix

但由于当前版本将是第一个更新日志，因此您无需先阅读该版本号，您真正需要的是：

$ awk '=="*"{f=1} f&&!NF{exit} f' file
* Mac release
* Security fix

使用 grep/awk 从文件中提取信息

Using grep/awk to extract info from a file

awk

grep