从列表列表中提取值并添加到新列

Extract value from a list of lists and add to new column

我有一个数据框,其中有一列是包含地址信息的列表列表。

我的数据:

import pandas as pd

data = [['location1', [(123, 'Number'),('Main', 'Street'),('New York', 'City')]], ['location2', [('Broadway', 'Street'),('New York', 'City'),(11111, 'ZIP')]], ['location3', [(987, 'Number'),('Grand', 'Street'),('Chicago', 'City'), (55555,'ZIP')]]]

df = pd.DataFrame(data, columns = ['Location', 'Address_Info'])

这将创建一个如下所示的数据框:

    Location    Address_Info
0   location1   [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')]
1   location2   [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')]
2   location3   [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')]

我需要提取其中包含“Number”值的列表。然后我需要将该列表中的数字添加到新列中的数据框中。

生成的数据框如下所示:

    Location    Address_Info                                                                 Number
0   location1   [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')]                  123
1   location2   [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')]               NaN
2   location3   [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')]  987

我 运行 遇到的一个问题是“Address_Info”中没有包含“数字”的列表

您可以使用列表理解和 str 访问器:

df['Address_Info'].apply(lambda l: [i[0] for i in l if i[1] == 'Number']).str[0]

输出:

0    123.0
1      NaN
2    987.0

将其保存在新列中:

df['Number'] = (df['Address_Info']
                  .apply(lambda l: [i[0] for i in l if i[1] == 'Number'])
                  .str[0]
               )

注意。如果你希望有几个数字,你可以省略 .str[0],然后你会得到一个数字列表(如果没有则为空):

df['Address_Info'].apply(lambda l: [i[0] for i in l if i[1] == 'Number'])

输出:

0    [123]
1       []
2    [987]

创建 DF 前准备数据

def get_number(lst):
    for x in lst:
        if x[1] == 'Number':
            return x[0]
    return None

data = [['location1', [(123, 'Number'),('Main', 'Street'),('New York', 'City')]], ['location2', [('Broadway', 'Street'),('New York', 'City'),(11111, 'ZIP')]], ['location3', [(987, 'Number'),('Grand', 'Street'),('Chicago', 'City'), (55555,'ZIP')]]]
for entry in data:
    entry.append(get_number(entry[1]))
print(data)
# now you can create the DF 

输出

[['location1', [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')], 123], ['location2', [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')], None], ['location3', [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')], 987]]

将列表展开成行,然后将元组展开成列,只保留 Number 的行。

df['Number'] = df['Address_Info'].explode() \
                                 .apply(pd.Series) \
                                 .rename(columns={0: 'value', 1: 'key'} \
                                 .query('key == "Number"')['value']
>>> df
    Location                                       Address_Info Number
0  location1  [(123, Number), (Main, Street), (New York, City)]    123
1  location2  [(Broadway, Street), (New York, City), (11111,...    NaN
2  location3  [(987, Number), (Grand, Street), (Chicago, Cit...    987