如何通过筛选和排序 "repeated" 值来比较两个列表
How to compare two lists by filtering and sorting "repeated" values
我有以下 act2.txt 电子邮件活动文件:
2021-04-02//email@example.com//Enhance your presentation skills in 15 minutes//Open
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Open
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-16//email@example.com//YOU ARE INVITED TO THIS PROGRAMMING EVENT//Delivered
2021-04-01//email@example.com//Enhance your presentation skills in 15 minutes//Delivered
2021-04-09//email@example.com//we are here to help you improve your skills//Delivered
2021-04-12//email@example.com//(1st meeting) here is our recorded presentation skills webinar//Delivered
2021-04-13//email@example.com//YOU ARE INVITED TO THIS PROGRAMMING EVENT//Delivered
我想按客户跟踪电子邮件 activity - 我计算了发送的电子邮件、发送的电子邮件然后打开率。
我生成了两个列表,一个用于发送的电子邮件,另一个用于打开的电子邮件:
import re
from pprint import pprint
#read the file with activities separated by //
afile = "act2.txt"
afile_read = open(afile,"r")
lines = afile_read.readlines()
activityList = []
for activities in lines:
activity = activities.split("//")
date = activity[0]
customer_email = activity[1]
email_title = activity[2]
action = activity[3]
stripped_line = [s.rstrip() for s in activity]
activityList.append(stripped_line)
#print (activityList)
stripped_email = 'email@example.com'
email_actions = [x for x in activityList if stripped_email in x[1]]
delivered = [x for x in email_actions if 'Delivered' in x]
Opened = [x for x in email_actions if 'Open' in x]
delcount = (len(delivered))
opencount = (len(Opened))
try:
Open_rate = opencount / delcount * 100
except ZeroDivisionError:
Open_rate = 0
print (stripped_email,",", delcount,",", opencount,",", Open_rate,"%")
pprint(delivered)
pprint (Opened)
送达名单:
[['2021-04-11',
'email@example.com',
'Enroll in the presentations skills - FREE WEBINAR',
'Delivered'],
['2021-04-11',
'email@example.com',
'Enroll in the presentations skills - FREE WEBINAR',
'Delivered'],
['2021-04-11',
'email@example.com',
'Enroll in the presentations skills - FREE WEBINAR',
'Delivered'],
['2021-04-16',
'email@example.com',
'YOU ARE INVITED TO THIS PROGRAMMING EVENT',
'Delivered'],
['2021-04-01',
'email@example.com',
'Enhance your presentation skills in 15 minutes',
'Delivered'],
['2021-04-09',
'email@example.com',
'we are here to help you improve your skills',
'Delivered'],
['2021-04-12',
'email@example.com',
'(1st meeting) here is our recorded presentation skills webinar',
'Delivered'],
['2021-04-13',
'email@example.com',
'YOU ARE INVITED TO THIS PROGRAMMING EVENT',
'Delivered']]
打开的列表:
[['2021-04-02',
'email@example.com',
'Enhance your presentation skills in 15 minutes',
'Open'],
['2021-04-11',
'email@example.com',
'Enroll in the presentations skills - FREE WEBINAR',
'Open']]
我想比较两个列表并生成第三个列表(合并 activity),按电子邮件主题过滤 - 如果主题在已发送列表和已打开列表中,那么它将被计为一个 activity。但是,邮件主题可以重复,比如邮件发送了 3 次,但只打开了一次。由于我仍在学习 python.
,因此我无法找到正确的逻辑
为更清晰起见编辑:
如果在按标题筛选的打开列表中找到一封电子邮件,则应在最后日期之前从已发送列表中删除相同的标题,并生成包含组合活动的新列表。
您需要以不同的方式思考这个问题,您不是在组合列表。
如果一封电子邮件被打开,则意味着它也被收到了。这意味着您打开的列表也是您的组合列表。
意识到这一点后,您只需将未打开的邮件复制到未打开邮件的结果列表即可。
查看打开的邮件列表并将主题复制到一个集合中,然后查看收到的电子邮件并检查主题是否在集合中,如果在则什么也不做。如果主题不在集合中,则将其复制到未打开的电子邮件列表。
很简单的一段代码:
opened_subjects = set()
unopened = []
for email in opened:
opened_subjects.add(email[2])
unopened_subjects = set()
for email in received:
if all(email[2] not in subj_set
for subj_set in (opened_subjects, unopened_subjects)):
unopened.append(email)
unopened_subjects.add(email[2])
print('Both received and opened:', opened)
print('Unopened emails:', unopened)
小记-
每组的原因是不同的。第一个集合 opened_subjects
存在是因为 set
能够只包含唯一的项目,而这正是本例中所需要的。第二个集合 unopened_subjects
在那里是因为检查一个项目是否在集合中比在列表中更快,因为我在以任何方式添加到集合之前正在检查,所以不需要集合能力仅存储唯一。
我有以下 act2.txt 电子邮件活动文件:
2021-04-02//email@example.com//Enhance your presentation skills in 15 minutes//Open
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Open
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-16//email@example.com//YOU ARE INVITED TO THIS PROGRAMMING EVENT//Delivered
2021-04-01//email@example.com//Enhance your presentation skills in 15 minutes//Delivered
2021-04-09//email@example.com//we are here to help you improve your skills//Delivered
2021-04-12//email@example.com//(1st meeting) here is our recorded presentation skills webinar//Delivered
2021-04-13//email@example.com//YOU ARE INVITED TO THIS PROGRAMMING EVENT//Delivered
我想按客户跟踪电子邮件 activity - 我计算了发送的电子邮件、发送的电子邮件然后打开率。
我生成了两个列表,一个用于发送的电子邮件,另一个用于打开的电子邮件:
import re
from pprint import pprint
#read the file with activities separated by //
afile = "act2.txt"
afile_read = open(afile,"r")
lines = afile_read.readlines()
activityList = []
for activities in lines:
activity = activities.split("//")
date = activity[0]
customer_email = activity[1]
email_title = activity[2]
action = activity[3]
stripped_line = [s.rstrip() for s in activity]
activityList.append(stripped_line)
#print (activityList)
stripped_email = 'email@example.com'
email_actions = [x for x in activityList if stripped_email in x[1]]
delivered = [x for x in email_actions if 'Delivered' in x]
Opened = [x for x in email_actions if 'Open' in x]
delcount = (len(delivered))
opencount = (len(Opened))
try:
Open_rate = opencount / delcount * 100
except ZeroDivisionError:
Open_rate = 0
print (stripped_email,",", delcount,",", opencount,",", Open_rate,"%")
pprint(delivered)
pprint (Opened)
送达名单:
[['2021-04-11',
'email@example.com',
'Enroll in the presentations skills - FREE WEBINAR',
'Delivered'],
['2021-04-11',
'email@example.com',
'Enroll in the presentations skills - FREE WEBINAR',
'Delivered'],
['2021-04-11',
'email@example.com',
'Enroll in the presentations skills - FREE WEBINAR',
'Delivered'],
['2021-04-16',
'email@example.com',
'YOU ARE INVITED TO THIS PROGRAMMING EVENT',
'Delivered'],
['2021-04-01',
'email@example.com',
'Enhance your presentation skills in 15 minutes',
'Delivered'],
['2021-04-09',
'email@example.com',
'we are here to help you improve your skills',
'Delivered'],
['2021-04-12',
'email@example.com',
'(1st meeting) here is our recorded presentation skills webinar',
'Delivered'],
['2021-04-13',
'email@example.com',
'YOU ARE INVITED TO THIS PROGRAMMING EVENT',
'Delivered']]
打开的列表:
[['2021-04-02',
'email@example.com',
'Enhance your presentation skills in 15 minutes',
'Open'],
['2021-04-11',
'email@example.com',
'Enroll in the presentations skills - FREE WEBINAR',
'Open']]
我想比较两个列表并生成第三个列表(合并 activity),按电子邮件主题过滤 - 如果主题在已发送列表和已打开列表中,那么它将被计为一个 activity。但是,邮件主题可以重复,比如邮件发送了 3 次,但只打开了一次。由于我仍在学习 python.
,因此我无法找到正确的逻辑为更清晰起见编辑:
如果在按标题筛选的打开列表中找到一封电子邮件,则应在最后日期之前从已发送列表中删除相同的标题,并生成包含组合活动的新列表。
您需要以不同的方式思考这个问题,您不是在组合列表。
如果一封电子邮件被打开,则意味着它也被收到了。这意味着您打开的列表也是您的组合列表。
意识到这一点后,您只需将未打开的邮件复制到未打开邮件的结果列表即可。
查看打开的邮件列表并将主题复制到一个集合中,然后查看收到的电子邮件并检查主题是否在集合中,如果在则什么也不做。如果主题不在集合中,则将其复制到未打开的电子邮件列表。
很简单的一段代码:
opened_subjects = set()
unopened = []
for email in opened:
opened_subjects.add(email[2])
unopened_subjects = set()
for email in received:
if all(email[2] not in subj_set
for subj_set in (opened_subjects, unopened_subjects)):
unopened.append(email)
unopened_subjects.add(email[2])
print('Both received and opened:', opened)
print('Unopened emails:', unopened)
小记-
每组的原因是不同的。第一个集合 opened_subjects
存在是因为 set
能够只包含唯一的项目,而这正是本例中所需要的。第二个集合 unopened_subjects
在那里是因为检查一个项目是否在集合中比在列表中更快,因为我在以任何方式添加到集合之前正在检查,所以不需要集合能力仅存储唯一。