如何从每个文本行中提取 phone 号码和 Pin

Question

日志文件中的示例文本

2021/08/29 10:25:37 20210202GL1 Message Params [userid:user1] [timestamp:20210829] [from:TEST] [to:0214736848] [text:You requested for Pin reset. Your Customer ID: 0214736848 and PIN: 4581]
2021/08/27 00:03:18 20210202GL2 Message Params [userid:user1] [timestamp:20210827] [from:TEST] [to:0214736457] [text:You requested for Pin reset. Your Customer ID: 0214736457 and PIN: 6193]
2021/08/27 10:25:16 Thank you for joining our service; Your ID is 0214736849 and PIN is 5949

其他措辞和格式可以更改，但 ID 和 PIN 不变

每行的预期输出

0214736848#4581
0214736457#6193
0214736849#5949

以下是我使用 bash 进行的尝试，但目前只能提取数值

while read p; do 

NUM='' 
counter=1;
text=$(echo "$p" | grep -o -E '[0-9]+')

for line in $text
do
if [ "$counter" -eq 1 ] #if is equal to 1
then
 NUM+="$line"  #concatenate string
 else
 NUM+="#$line"  #concatenate string
 fi
 let counter++  #Increment counter
done

printf "$NUM\n"
done < logfile.log

当前输出虽然不是预期的。

2021#08#29#00#03#18#20210202#2#1#20210826#0214736457#0214736457#6193
2021#08#27#10#25#37#20210202#1#1#20210825#0214736848#0214736848#4581
2021#08#27#10#25#16#0214736849#5949

Answer 1

使用 sed 捕获组你可以做：

sed 's/.* Your Customer ID: \([0-9]*\) and PIN: \([0-9]*\).*/#/g' file.txt

Answer 2

使用 bash 和正则表达式：

while IFS='] ' read -r line; do
  [[ "$line" =~ ID:\ ([^\ ]+).*PIN:\ ([^\ ]+)] ]]
  echo "${BASH_REMATCH[1]}#${BASH_REMATCH[2]}"
done <file

输出：

0214736848#4581
0214736457#6193

Answer 3

使用 gawk 和 2 个捕获组的另一种变体，每组匹配 1 个或多个数字：

awk '
match([=10=], /ID: ([0-9]+) and PIN: ([0-9]+)/, m) {
  print m[1]"#"m[2]
}
' file

输出

0214736848#4581
0214736457#6193

对于更新后的问题，如果您想要更精确的匹配，您可以匹配 : 或 is，捕获组值将为 2 和 4。

awk '
match([=12=], /ID(:| is) ([0-9]+) and PIN(:| is) ([0-9]+)/, m) {
  print m[2]"#"m[4]
}
' file

输出

0214736848#4581
0214736457#6193
0214736849#5949

Answer 4

使用您显示的示例，请尝试遵循 awk 代码，您可以使用不同的字段分隔符来轻松完成。简单的解释是，将 Customer ID: OR and PIN: OR ]$ 作为字段分隔符，然后记住它们只打印第 2 和第 3 个字段以及 # 根据要求的输出OP.

awk -v FS='Customer ID: | and PIN: |]$' '{print "#"}' Input_file

Answer 5

根据您问题中的更新输入，然后在每个 Unix 机器上的任何 shell 中使用任何 sed：

$ sed 's/.* ID[: ][^0-9]*\([0-9]*\).* PIN[: ][^0-9]*\([0-9]*\).*/#/' file
0214736848#4581
0214736457#6193
0214736849#5949

原回答：

在每个 Unix 机器上的任何 shell 中使用任何 awk：

$ awk -v OFS='#' '{print , +0}' file
0214736848#4581
0214736457#6193

如何从每个文本行中提取 phone 号码和 Pin

How to extract phone number and Pin from each text line

bash

awk

grep

sed