如何使用 python 从单个文本文件中提取多个电子邮件 ID 和 phone 号码?
How can extract multiple email id's and phone numbers from a single text file with python?
您好,我有一个包含多个信息的大型文本文件。我想使用 python 程序或工具仅提取电子邮件 ID 和 phone 号码。
HTTP/1.1 200 OK
{"id":"269","first_name":"N S","last_name":"","balance":"0","phonecode":null,"mobile":null,"email":"wand412@gmail.com","verified":"0","password":""}
HTTP/1.1 200 OK
{"id":"303","first_name":"Devi","last_name":"Baruah","balance":"0","phonecode":null,"mobile":null,"email":"dxxxxxx@yahoo.com","verified":"0","password":""}
HTTP/1.1 200 OK
{"id":"306","first_name":"Rashmi","last_name":"Kumari","balance":"24","phonecode":"91","mobile":"9xxxxxxx","email":"xxxxxxx7@gmail.com","verified":"1","password":"xxxx"}
HTTP/1.1 200 OK
{"id":"308","first_name":"ashwini","last_name":"gokhale","balance":"7","phonecode":"1","mobile":"61xxxx","email":"axxxx@gmail.com","verified":"1","password":"xxxxxxx"}
HTTP/1.1 200 OK
{"id":"307","first_name":"Rama","last_name":"De","balance":"0","phonecode":"91","mobile":"73xxxxxx","email":"dexxxx@gmail.com","verified":"1","password":"xxxx"}
如果您的文件名为 test.txt
,您可以使用以下代码片段解析文件中的 json 部分,一次一行:
import json
items = []
with open("test.txt") as file_handle:
for line in file_handle:
try:
if item := json.loads(line):
items.append(item)
except json.decoder.JSONDecodeError:
pass
# 'items' is a list of dictionaries that contain each user's details.
# If you want to extract the IDs, email addresses and phone numbers into separate lists, one way to do it is:
ids = [item.get("id") for item in items]
email = [item.get("email") for item in items]
mobile = [item.get("mobile") for item in items]
看起来这是来自网络服务器的日志。如果可能,请先尝试使用更干净的文件,
总之:
import json
mandatory_keys = ['email', 'mobile']
file_str = []
out = []
with open('test') as fd:
file_str = [x.rstrip('\n') for x in fd.readlines() if x.startswith('{')]
for j_str in file_str:
try:
j = json.loads(j_str)
assert [x for x in mandatory_keys if x in j.keys()] == mandatory_keys, f'missing mandatory_keys'
out.append({k: v for k, v in j.items() if k in mandatory_keys})
except:
raise ValueError('Something wrong with the json')
print(out)
您可能还想使用一些 json 模型验证器作为 'jsonschema' 来替换那里的断言行并有一个明确的错误消息。
更改 mandatory_key 列表,您可以轻松更新输出。
您好,我有一个包含多个信息的大型文本文件。我想使用 python 程序或工具仅提取电子邮件 ID 和 phone 号码。
HTTP/1.1 200 OK
{"id":"269","first_name":"N S","last_name":"","balance":"0","phonecode":null,"mobile":null,"email":"wand412@gmail.com","verified":"0","password":""}
HTTP/1.1 200 OK
{"id":"303","first_name":"Devi","last_name":"Baruah","balance":"0","phonecode":null,"mobile":null,"email":"dxxxxxx@yahoo.com","verified":"0","password":""}
HTTP/1.1 200 OK
{"id":"306","first_name":"Rashmi","last_name":"Kumari","balance":"24","phonecode":"91","mobile":"9xxxxxxx","email":"xxxxxxx7@gmail.com","verified":"1","password":"xxxx"}
HTTP/1.1 200 OK
{"id":"308","first_name":"ashwini","last_name":"gokhale","balance":"7","phonecode":"1","mobile":"61xxxx","email":"axxxx@gmail.com","verified":"1","password":"xxxxxxx"}
HTTP/1.1 200 OK
{"id":"307","first_name":"Rama","last_name":"De","balance":"0","phonecode":"91","mobile":"73xxxxxx","email":"dexxxx@gmail.com","verified":"1","password":"xxxx"}
如果您的文件名为 test.txt
,您可以使用以下代码片段解析文件中的 json 部分,一次一行:
import json
items = []
with open("test.txt") as file_handle:
for line in file_handle:
try:
if item := json.loads(line):
items.append(item)
except json.decoder.JSONDecodeError:
pass
# 'items' is a list of dictionaries that contain each user's details.
# If you want to extract the IDs, email addresses and phone numbers into separate lists, one way to do it is:
ids = [item.get("id") for item in items]
email = [item.get("email") for item in items]
mobile = [item.get("mobile") for item in items]
看起来这是来自网络服务器的日志。如果可能,请先尝试使用更干净的文件,
总之:
import json
mandatory_keys = ['email', 'mobile']
file_str = []
out = []
with open('test') as fd:
file_str = [x.rstrip('\n') for x in fd.readlines() if x.startswith('{')]
for j_str in file_str:
try:
j = json.loads(j_str)
assert [x for x in mandatory_keys if x in j.keys()] == mandatory_keys, f'missing mandatory_keys'
out.append({k: v for k, v in j.items() if k in mandatory_keys})
except:
raise ValueError('Something wrong with the json')
print(out)
您可能还想使用一些 json 模型验证器作为 'jsonschema' 来替换那里的断言行并有一个明确的错误消息。
更改 mandatory_key 列表,您可以轻松更新输出。