从列表列表中提取值并添加到新列
Extract value from a list of lists and add to new column
我有一个数据框,其中有一列是包含地址信息的列表列表。
我的数据:
import pandas as pd
data = [['location1', [(123, 'Number'),('Main', 'Street'),('New York', 'City')]], ['location2', [('Broadway', 'Street'),('New York', 'City'),(11111, 'ZIP')]], ['location3', [(987, 'Number'),('Grand', 'Street'),('Chicago', 'City'), (55555,'ZIP')]]]
df = pd.DataFrame(data, columns = ['Location', 'Address_Info'])
这将创建一个如下所示的数据框:
Location Address_Info
0 location1 [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')]
1 location2 [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')]
2 location3 [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')]
我需要提取其中包含“Number”值的列表。然后我需要将该列表中的数字添加到新列中的数据框中。
生成的数据框如下所示:
Location Address_Info Number
0 location1 [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')] 123
1 location2 [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')] NaN
2 location3 [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')] 987
我 运行 遇到的一个问题是“Address_Info”中没有包含“数字”的列表
您可以使用列表理解和 str
访问器:
df['Address_Info'].apply(lambda l: [i[0] for i in l if i[1] == 'Number']).str[0]
输出:
0 123.0
1 NaN
2 987.0
将其保存在新列中:
df['Number'] = (df['Address_Info']
.apply(lambda l: [i[0] for i in l if i[1] == 'Number'])
.str[0]
)
注意。如果你希望有几个数字,你可以省略 .str[0]
,然后你会得到一个数字列表(如果没有则为空):
df['Address_Info'].apply(lambda l: [i[0] for i in l if i[1] == 'Number'])
输出:
0 [123]
1 []
2 [987]
创建 DF 前准备数据
def get_number(lst):
for x in lst:
if x[1] == 'Number':
return x[0]
return None
data = [['location1', [(123, 'Number'),('Main', 'Street'),('New York', 'City')]], ['location2', [('Broadway', 'Street'),('New York', 'City'),(11111, 'ZIP')]], ['location3', [(987, 'Number'),('Grand', 'Street'),('Chicago', 'City'), (55555,'ZIP')]]]
for entry in data:
entry.append(get_number(entry[1]))
print(data)
# now you can create the DF
输出
[['location1', [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')], 123], ['location2', [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')], None], ['location3', [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')], 987]]
将列表展开成行,然后将元组展开成列,只保留 Number
的行。
df['Number'] = df['Address_Info'].explode() \
.apply(pd.Series) \
.rename(columns={0: 'value', 1: 'key'} \
.query('key == "Number"')['value']
>>> df
Location Address_Info Number
0 location1 [(123, Number), (Main, Street), (New York, City)] 123
1 location2 [(Broadway, Street), (New York, City), (11111,... NaN
2 location3 [(987, Number), (Grand, Street), (Chicago, Cit... 987
我有一个数据框,其中有一列是包含地址信息的列表列表。
我的数据:
import pandas as pd
data = [['location1', [(123, 'Number'),('Main', 'Street'),('New York', 'City')]], ['location2', [('Broadway', 'Street'),('New York', 'City'),(11111, 'ZIP')]], ['location3', [(987, 'Number'),('Grand', 'Street'),('Chicago', 'City'), (55555,'ZIP')]]]
df = pd.DataFrame(data, columns = ['Location', 'Address_Info'])
这将创建一个如下所示的数据框:
Location Address_Info
0 location1 [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')]
1 location2 [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')]
2 location3 [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')]
我需要提取其中包含“Number”值的列表。然后我需要将该列表中的数字添加到新列中的数据框中。
生成的数据框如下所示:
Location Address_Info Number
0 location1 [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')] 123
1 location2 [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')] NaN
2 location3 [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')] 987
我 运行 遇到的一个问题是“Address_Info”中没有包含“数字”的列表
您可以使用列表理解和 str
访问器:
df['Address_Info'].apply(lambda l: [i[0] for i in l if i[1] == 'Number']).str[0]
输出:
0 123.0
1 NaN
2 987.0
将其保存在新列中:
df['Number'] = (df['Address_Info']
.apply(lambda l: [i[0] for i in l if i[1] == 'Number'])
.str[0]
)
注意。如果你希望有几个数字,你可以省略 .str[0]
,然后你会得到一个数字列表(如果没有则为空):
df['Address_Info'].apply(lambda l: [i[0] for i in l if i[1] == 'Number'])
输出:
0 [123]
1 []
2 [987]
创建 DF 前准备数据
def get_number(lst):
for x in lst:
if x[1] == 'Number':
return x[0]
return None
data = [['location1', [(123, 'Number'),('Main', 'Street'),('New York', 'City')]], ['location2', [('Broadway', 'Street'),('New York', 'City'),(11111, 'ZIP')]], ['location3', [(987, 'Number'),('Grand', 'Street'),('Chicago', 'City'), (55555,'ZIP')]]]
for entry in data:
entry.append(get_number(entry[1]))
print(data)
# now you can create the DF
输出
[['location1', [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')], 123], ['location2', [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')], None], ['location3', [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')], 987]]
将列表展开成行,然后将元组展开成列,只保留 Number
的行。
df['Number'] = df['Address_Info'].explode() \
.apply(pd.Series) \
.rename(columns={0: 'value', 1: 'key'} \
.query('key == "Number"')['value']
>>> df
Location Address_Info Number
0 location1 [(123, Number), (Main, Street), (New York, City)] 123
1 location2 [(Broadway, Street), (New York, City), (11111,... NaN
2 location3 [(987, Number), (Grand, Street), (Chicago, Cit... 987