使用正则表达式 bash 将字符串提取到变量
Extracting string to variable using regex bash
我有一个像这样的字符串:
Return-Path: bT.41aywtru20=krja5b54hplm=k29fsc7grl@fake.link.com
Received-SPF: pass (fake.link.com: Sender is authorized to use 'bt.41aywtru20=krja5b54hplm=k29fsc7grl@fake.link.com' in 'mfrom' identity (mechanism 'include:spf.smtp2go.com' matched)) receiver=pmxlab01.permission.email; identity=mailfrom; envelope-from="bt.41aywtru20=krja5b54hplm=k29fsc7grl@fake.link.com"; helo=e2i353.smtp2go.com; client-ip=103.2.141.97
Received: from e2i353.smtp2go.com (e2i353.smtp2go.com [103.2.141.97])
by mailserver.fake.com(Proxmox) with ESMTP id A4F983E1048
for <fake@fake.com>; Tue, 24 Aug 2021 14:47:20 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
d=smtpcorp.com; s=a1-4; h=Feedback-ID:X-Smtpcorp-Track:Message-Id:Subject:
Date:To:From:Reply-To:Sender:List-Unsubscribe;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=; b=STU7lctit7L5LJ2tA3Re1fe4II
lXJbY/SBXTGqCHh9p4K86aLK5Bvz98Q7eR9xwjFib6x4NoZZ5L1fke0XQERd1eQvxkl9R+kRIGU8A
QOtrLPpt8coN8P+syoaTRR4pDJQG9OfJO1fON9OaOP8HwnEg/91ie6Cm+wQRxjwyat859uAcu89Xv
6/mrcequkSp6kfiQN4goZ7vMYJYfBYuooslbTciaK4SYIfxdINyrrWGA6QhJPobdW0uuedRNY5jBG
OdMbVmm7FTpxDJs51rB1PTIcFQ8W1oypcttqSgCjI+5eMVrabU/IoIxhX5F0Cn3zm7E9CHlaJuLt1
CRXVbwdw==;
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=fake.com; i=@fakelink.com; q=dns/txt; s=s575655;
t=1629812840; h=from : subject : to : message-id : date;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=;
b=TEeEsPNLf7Wi6b8aaxE6JvfymfBKYjLq7izcUVrOXTW7sGIznxOA5udhfmDh15Fgp6Qgh
Kv5HX9uPNa8TEeoaJ+gV/4KERuscnc4GXEHwo0eclktx6f6JI5h1/q+qCe34+cN/EweaP5n
iOs+nrzsRuWn/iQ0Yck+b4IXVWHoTW8298xmBNuC1JF4jIVXREJFAC0nACfGU03OlpjDXf/
qvI6Ffnn5YGTNxgIkOdrtymaqOvjG9NM0PWtgSkvsTCJdUvxkrI+rRUG6ixiNi+vifqwvox
aQ6BRnMmeNK7A954Dy9r9r09QzbTthsBsi+lORKH7DntBKhm7Rb5/Q9j0xVA==
Received: from [10.176.58.103] (helo=SmtpCorp) by smtpcorp.com with esmtpsa
(TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <tomtest@fakelink.com>)
id 1mIWls-TRjyEC-AK for fake@fake.com; Tue, 24 Aug 2021 13:47:20 +0000
Received: from [10.86.20.232] (helo=DESKTOP-69OG2R3)
by smtpcorp.com with esmtpsa (TLS1.2:ECDHE_RSA_SECP256R1__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <fakename@fakelink.com>)
id 1mIWlr-9EFPsz-U0 for fake@fake.com; Tue, 24 Aug 2021 13:47:19 +0000
MIME-Version: 1.0
From: fake@fake2.com
To: fake@fake.com
Date: 24 Aug 2021 14:46:30 +0100
Subject: Test Email 2xM9e5Dj
Content-Type: multipart/alternative;
boundary=--boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Message-Id: <E1mIWlr-9EFPsz-U0@message-id.smtpcorp.com>
X-Smtpcorp-Track: 1XmW_r9EFeszl0.JChXLDDjoy7xH
Feedback-ID: 575655m:575655aVI_MaS:575655sNpPp5WOdD
X-Report-Abuse: Please forward a copy of this message, including all headers,
to <abuse-report@smtp2go.com>
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a text message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a html message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e--
这存储在一个名为 $emailText
的变量中
我正在尝试使用正则表达式从文本中删除“发件人”部分
From: fake@fake2.com
我的正则表达式不是很强大,但是我的测试看起来是这样的:(?<=From: ).*.
但是当我尝试取出文本时,我似乎无法让正则表达式正确通过。
echo [[ $emailText =~ (?<=From: ).*. ]]
bash
正则表达式不支持后向断言或先行断言。
在这里使用 awk 使用非正则表达式方法要容易得多:
awk -F ': ' ' == "From" {print }' <<< "$emailText"
fake@fake2.com
如果应该有一个邮件地址,您可以先使用 awk
匹配它(没有不受支持的环视需求)
awk 'match([=10=], /^From: [^[:space:]@]+@[^[:space:]@]+$/) {
print
}' <<< "$emailText"
输出
fake@fake2.com
假设您只需要电子邮件终端,这里有一个快速而肮脏的 Awk 脚本。
awk '/^$/ { exit 1 }
/^From: .* <[^<>@]+@[^<>]+>/ {
split([=10=], g, /[<>]/); print g[1]; exit }
/^From: / { print ; exit }' file.eml
这应该适用于所有这些情况:
From: Real Name <real.name@example.com>
From: "Name, Real" <outlook@torture.example.com>
From: terminus@example.com
From: terminus@example.com (Real Name)
From: =?q?utf-8?Real_N=A3=E4me?= <real.name@example.com>
尤其是最后一个例子应该让您信服,如果您还需要标准化形式的通讯员全名,您将需要做更多的工作。
与bash
:
[[ "$emailText" =~ From:\ ([^$'\n']*) ]] && echo "${BASH_REMATCH[1]}"
输出:
fake@fake2.com
用你展示的样品,尝试;请尝试使用 awk
代码。简单的解释是,如果第一个字段是 From: 检查条件,则打印该行的第二个字段。
awk '=="From:"{print }' Input_file
第二个解决方案: 如果整个文件中只有 1 个 From:
条目,请尝试以下,我们在哪里可以使用 exit
函数在打印匹配行后从 Input_file 退出,以停止对整个 Input_file.
的不必要的读取
awk '=="From:"{print ;exit}' Input_file
我有一个像这样的字符串:
Return-Path: bT.41aywtru20=krja5b54hplm=k29fsc7grl@fake.link.com
Received-SPF: pass (fake.link.com: Sender is authorized to use 'bt.41aywtru20=krja5b54hplm=k29fsc7grl@fake.link.com' in 'mfrom' identity (mechanism 'include:spf.smtp2go.com' matched)) receiver=pmxlab01.permission.email; identity=mailfrom; envelope-from="bt.41aywtru20=krja5b54hplm=k29fsc7grl@fake.link.com"; helo=e2i353.smtp2go.com; client-ip=103.2.141.97
Received: from e2i353.smtp2go.com (e2i353.smtp2go.com [103.2.141.97])
by mailserver.fake.com(Proxmox) with ESMTP id A4F983E1048
for <fake@fake.com>; Tue, 24 Aug 2021 14:47:20 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
d=smtpcorp.com; s=a1-4; h=Feedback-ID:X-Smtpcorp-Track:Message-Id:Subject:
Date:To:From:Reply-To:Sender:List-Unsubscribe;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=; b=STU7lctit7L5LJ2tA3Re1fe4II
lXJbY/SBXTGqCHh9p4K86aLK5Bvz98Q7eR9xwjFib6x4NoZZ5L1fke0XQERd1eQvxkl9R+kRIGU8A
QOtrLPpt8coN8P+syoaTRR4pDJQG9OfJO1fON9OaOP8HwnEg/91ie6Cm+wQRxjwyat859uAcu89Xv
6/mrcequkSp6kfiQN4goZ7vMYJYfBYuooslbTciaK4SYIfxdINyrrWGA6QhJPobdW0uuedRNY5jBG
OdMbVmm7FTpxDJs51rB1PTIcFQ8W1oypcttqSgCjI+5eMVrabU/IoIxhX5F0Cn3zm7E9CHlaJuLt1
CRXVbwdw==;
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=fake.com; i=@fakelink.com; q=dns/txt; s=s575655;
t=1629812840; h=from : subject : to : message-id : date;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=;
b=TEeEsPNLf7Wi6b8aaxE6JvfymfBKYjLq7izcUVrOXTW7sGIznxOA5udhfmDh15Fgp6Qgh
Kv5HX9uPNa8TEeoaJ+gV/4KERuscnc4GXEHwo0eclktx6f6JI5h1/q+qCe34+cN/EweaP5n
iOs+nrzsRuWn/iQ0Yck+b4IXVWHoTW8298xmBNuC1JF4jIVXREJFAC0nACfGU03OlpjDXf/
qvI6Ffnn5YGTNxgIkOdrtymaqOvjG9NM0PWtgSkvsTCJdUvxkrI+rRUG6ixiNi+vifqwvox
aQ6BRnMmeNK7A954Dy9r9r09QzbTthsBsi+lORKH7DntBKhm7Rb5/Q9j0xVA==
Received: from [10.176.58.103] (helo=SmtpCorp) by smtpcorp.com with esmtpsa
(TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <tomtest@fakelink.com>)
id 1mIWls-TRjyEC-AK for fake@fake.com; Tue, 24 Aug 2021 13:47:20 +0000
Received: from [10.86.20.232] (helo=DESKTOP-69OG2R3)
by smtpcorp.com with esmtpsa (TLS1.2:ECDHE_RSA_SECP256R1__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <fakename@fakelink.com>)
id 1mIWlr-9EFPsz-U0 for fake@fake.com; Tue, 24 Aug 2021 13:47:19 +0000
MIME-Version: 1.0
From: fake@fake2.com
To: fake@fake.com
Date: 24 Aug 2021 14:46:30 +0100
Subject: Test Email 2xM9e5Dj
Content-Type: multipart/alternative;
boundary=--boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Message-Id: <E1mIWlr-9EFPsz-U0@message-id.smtpcorp.com>
X-Smtpcorp-Track: 1XmW_r9EFeszl0.JChXLDDjoy7xH
Feedback-ID: 575655m:575655aVI_MaS:575655sNpPp5WOdD
X-Report-Abuse: Please forward a copy of this message, including all headers,
to <abuse-report@smtp2go.com>
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a text message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a html message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e--
这存储在一个名为 $emailText
我正在尝试使用正则表达式从文本中删除“发件人”部分
From: fake@fake2.com
我的正则表达式不是很强大,但是我的测试看起来是这样的:(?<=From: ).*.
但是当我尝试取出文本时,我似乎无法让正则表达式正确通过。
echo [[ $emailText =~ (?<=From: ).*. ]]
bash
正则表达式不支持后向断言或先行断言。
在这里使用 awk 使用非正则表达式方法要容易得多:
awk -F ': ' ' == "From" {print }' <<< "$emailText"
fake@fake2.com
如果应该有一个邮件地址,您可以先使用 awk
匹配它(没有不受支持的环视需求)
awk 'match([=10=], /^From: [^[:space:]@]+@[^[:space:]@]+$/) {
print
}' <<< "$emailText"
输出
fake@fake2.com
假设您只需要电子邮件终端,这里有一个快速而肮脏的 Awk 脚本。
awk '/^$/ { exit 1 }
/^From: .* <[^<>@]+@[^<>]+>/ {
split([=10=], g, /[<>]/); print g[1]; exit }
/^From: / { print ; exit }' file.eml
这应该适用于所有这些情况:
From: Real Name <real.name@example.com>
From: "Name, Real" <outlook@torture.example.com>
From: terminus@example.com
From: terminus@example.com (Real Name)
From: =?q?utf-8?Real_N=A3=E4me?= <real.name@example.com>
尤其是最后一个例子应该让您信服,如果您还需要标准化形式的通讯员全名,您将需要做更多的工作。
与bash
:
[[ "$emailText" =~ From:\ ([^$'\n']*) ]] && echo "${BASH_REMATCH[1]}"
输出:
fake@fake2.com
用你展示的样品,尝试;请尝试使用 awk
代码。简单的解释是,如果第一个字段是 From: 检查条件,则打印该行的第二个字段。
awk '=="From:"{print }' Input_file
第二个解决方案: 如果整个文件中只有 1 个 From:
条目,请尝试以下,我们在哪里可以使用 exit
函数在打印匹配行后从 Input_file 退出,以停止对整个 Input_file.
awk '=="From:"{print ;exit}' Input_file