Python IMAPLIB HTML 正文解析
Python IMAPLIB HTML body parsing
所以基本上我有这个连续运行的脚本,当一封新电子邮件到达收件箱时,主题中包含特定文本,它会从电子邮件中获取信息。我只设法让它从电子邮件中提取主题,但无论我尝试什么,我都无法让它获取电子邮件的正文,我相信电子邮件正文在 HTML 中,所以我尝试使用 BeautifulSoup 来解析正文,但这根本不起作用。请帮忙!!! :( 这是我目前所拥有的:
import email
import imaplib
from bs4 import BeautifulSoup
import time
import sys
username = 'xxx.xxx@xxx.xx'
password = 'xxxxxx'
mail = imaplib.IMAP4_SSL('imap-mail.outlook.com')
(retcode, capabilities) = mail.login(username, password)
mail.list()
n=0
while True:
mail.select('inbox')
(retcode, messages) = mail.search(None, 'UNSEEN', '(SUBJECT "xxxxxxx-
")', '(FROM "xx.xx@xxxx.xx")')
if retcode == 'OK':
for num in messages[0].split():
n=n+1
print('Processing Email ' + str(n))
typ, data = mail.fetch(num, '(RFC822)')
for response_part in data:
if isinstance(response_part, tuple):
original = email.message_from_bytes(response_part[1])
print("Subject: " + original['Subject'])
typ, data = mail.store(num,'+FLAGS','\Seen')
time.sleep(120)
Comment: The "body" returned by imap.fetch
are usually bytes
, not a string, which throws an exception
更改为:
msg = email.message_from_bytes(body)
Question: I cant get it do get the body of the email
例如:
import email, imaplib
username = 'xxx.xxx@xxx.xx'
password = 'xxxxxx'
imap = imaplib.IMAP4_SSL('imap-mail.outlook.com')
imap.login(username, password)
imap.select("inbox")
resp, items = imap.search(None, "(UNSEEN)")
for n, num in enumerate(items[0].split(), 1):
resp, data = imap.fetch(num, '(RFC822)')
body = data[0][1]
msg = email.message_from_string(body)
content = msg.get_payload(decode=True)
print("Message content[{}]:{}".format(n, content))
所以基本上我有这个连续运行的脚本,当一封新电子邮件到达收件箱时,主题中包含特定文本,它会从电子邮件中获取信息。我只设法让它从电子邮件中提取主题,但无论我尝试什么,我都无法让它获取电子邮件的正文,我相信电子邮件正文在 HTML 中,所以我尝试使用 BeautifulSoup 来解析正文,但这根本不起作用。请帮忙!!! :( 这是我目前所拥有的:
import email
import imaplib
from bs4 import BeautifulSoup
import time
import sys
username = 'xxx.xxx@xxx.xx'
password = 'xxxxxx'
mail = imaplib.IMAP4_SSL('imap-mail.outlook.com')
(retcode, capabilities) = mail.login(username, password)
mail.list()
n=0
while True:
mail.select('inbox')
(retcode, messages) = mail.search(None, 'UNSEEN', '(SUBJECT "xxxxxxx-
")', '(FROM "xx.xx@xxxx.xx")')
if retcode == 'OK':
for num in messages[0].split():
n=n+1
print('Processing Email ' + str(n))
typ, data = mail.fetch(num, '(RFC822)')
for response_part in data:
if isinstance(response_part, tuple):
original = email.message_from_bytes(response_part[1])
print("Subject: " + original['Subject'])
typ, data = mail.store(num,'+FLAGS','\Seen')
time.sleep(120)
Comment: The "body" returned by
imap.fetch
are usuallybytes
, not a string, which throws an exception
更改为:
msg = email.message_from_bytes(body)
Question: I cant get it do get the body of the email
例如:
import email, imaplib
username = 'xxx.xxx@xxx.xx'
password = 'xxxxxx'
imap = imaplib.IMAP4_SSL('imap-mail.outlook.com')
imap.login(username, password)
imap.select("inbox")
resp, items = imap.search(None, "(UNSEEN)")
for n, num in enumerate(items[0].split(), 1):
resp, data = imap.fetch(num, '(RFC822)')
body = data[0][1]
msg = email.message_from_string(body)
content = msg.get_payload(decode=True)
print("Message content[{}]:{}".format(n, content))