如何使用 Pandas 数据框将 R 代码语法转换为 Python 语法?
How to convert R code syntax into Python syntax using Pandas data frame?
假设我们在 R 中有以下代码,它与 Python 中的 Pandas 数据框 syntax/method 的等效代码是什么?
network_tickets <- contains(comcast_data$CustomerComplaint, match = 'network', ignore.case = T)
internet_tickets <- contains(comcast_data$CustomerComplaint, match = 'internet', ignore.case = T)
billing_tickets <- contains(comcast_data$CustomerComplaint, match = 'bill', ignore.case = T)
email_tickets <- contains(comcast_data$CustomerComplaint, match = 'email', ignore.case = T)
charges_ticket <- contains(comcast_data$CustomerComplaint, match = 'charge', ignore.case = T)
comcast_data$ComplaintType[internet_tickets] <- "Internet"
comcast_data$ComplaintType[network_tickets] <- "Network"
comcast_data$ComplaintType[billing_tickets] <- "Billing"
comcast_data$ComplaintType[email_tickets] <- "Email"
comcast_data$ComplaintType[charges_ticket] <- "Charges"
comcast_data$ComplaintType[-c(internet_tickets, network_tickets, billing_tickets, c
harges_ticket, email_tickets)] <- "Others"
我可以像下面这样转换第一组操作 Python:
network_tickets = df.ComplaintDescription.str.contains ('network', regex=True, case=False)
但是,发现将变量 network_tickets 作为值“Internet”分配给新的 pandas 数据框列(即 ComplaintType)的挑战。在 R 中,您似乎只需一行就可以做到这一点。
但是,不确定我们如何在 Python 中用一行代码做到这一点,尝试了以下方法但出现错误:
a) df['ComplaintType'].apply(internet_tickets) = "Internet"
b) df['ComplaintType'] = df.apply(internet_tickets)
c) df['ComplaintType'] = internet_tickets.apply("Internet")
我想我们可以先在数据框中创建一个新列:
df['ComplaintType'] = internet_tickets
但不确定接下来的步骤。
使用Series.str.contains
with DataFrame.loc
按列表设置值:
df = pd.DataFrame(data = {"ComplaintDescription":["BiLLing is super","email","new"]})
L = [ "Internet","Network", "Billing", "Email", "Charges"]
for val in L:
df.loc[df['ComplaintDescription'].str.contains(val, case=False), 'ComplaintType'] = val
df['ComplaintType'] = df['ComplaintType'].fillna('Others')
print (df)
ComplaintDescription ComplaintType
0 BiLLing is super Billing
1 email Email
2 new Others
编辑:
如果需要单独使用值:
df.loc[df['ComplaintDescription'].str.contains('network', case=False), 'ComplaintType'] = "Internet"
假设我们在 R 中有以下代码,它与 Python 中的 Pandas 数据框 syntax/method 的等效代码是什么?
network_tickets <- contains(comcast_data$CustomerComplaint, match = 'network', ignore.case = T)
internet_tickets <- contains(comcast_data$CustomerComplaint, match = 'internet', ignore.case = T)
billing_tickets <- contains(comcast_data$CustomerComplaint, match = 'bill', ignore.case = T)
email_tickets <- contains(comcast_data$CustomerComplaint, match = 'email', ignore.case = T)
charges_ticket <- contains(comcast_data$CustomerComplaint, match = 'charge', ignore.case = T)
comcast_data$ComplaintType[internet_tickets] <- "Internet"
comcast_data$ComplaintType[network_tickets] <- "Network"
comcast_data$ComplaintType[billing_tickets] <- "Billing"
comcast_data$ComplaintType[email_tickets] <- "Email"
comcast_data$ComplaintType[charges_ticket] <- "Charges"
comcast_data$ComplaintType[-c(internet_tickets, network_tickets, billing_tickets, c
harges_ticket, email_tickets)] <- "Others"
我可以像下面这样转换第一组操作 Python:
network_tickets = df.ComplaintDescription.str.contains ('network', regex=True, case=False)
但是,发现将变量 network_tickets 作为值“Internet”分配给新的 pandas 数据框列(即 ComplaintType)的挑战。在 R 中,您似乎只需一行就可以做到这一点。
但是,不确定我们如何在 Python 中用一行代码做到这一点,尝试了以下方法但出现错误:
a) df['ComplaintType'].apply(internet_tickets) = "Internet"
b) df['ComplaintType'] = df.apply(internet_tickets)
c) df['ComplaintType'] = internet_tickets.apply("Internet")
我想我们可以先在数据框中创建一个新列:
df['ComplaintType'] = internet_tickets
但不确定接下来的步骤。
使用Series.str.contains
with DataFrame.loc
按列表设置值:
df = pd.DataFrame(data = {"ComplaintDescription":["BiLLing is super","email","new"]})
L = [ "Internet","Network", "Billing", "Email", "Charges"]
for val in L:
df.loc[df['ComplaintDescription'].str.contains(val, case=False), 'ComplaintType'] = val
df['ComplaintType'] = df['ComplaintType'].fillna('Others')
print (df)
ComplaintDescription ComplaintType
0 BiLLing is super Billing
1 email Email
2 new Others
编辑:
如果需要单独使用值:
df.loc[df['ComplaintDescription'].str.contains('network', case=False), 'ComplaintType'] = "Internet"