应用函数在 pandas 中创建多列

Question

如标题所示，我正在寻找一种方法来为我的数据框中的每一行应用一个函数，并针对一列创建多个新列。

所以我的意思是，我有一个包含不同城市名称的 df。现在我想创建 2 个新行，其中包含人口规模、国家/地区、成立日期以及更多信息。

我有一个 API，returns 那些信息是 json。到目前为止，一切都很好。我目前正在做的是使用 for 循环遍历 df，进行 api 调用，然后使用 iloc 设置列...它不是很好或高效。

我想知道是否可以使用 apply/transform 函数产生类似的结果。

我目前的解决方案：

for index, row in data.iterrows():
    print("--------------------->" + str(index))
    try:
        infos = cityInfo.download(row["city"],row["zip"])
    except:
        break
    if len(infos) == 0:
        print("City not found!")
    else:
        data["pop"].iloc[index] = infos["population"]
        data["country"].iloc[index] = infos["country"]
        data["founding"].iloc[index] = infos["foundingDate"

我很高兴得到任何帮助或提示

Answer 1

不要指望对涉及 API 的问题有好的答案，这里没有人能洞察到。在循环的每次迭代中调用 API 可能是最慢的部分。如果您的 API 不允许在一次调用中下载所有信息，您可能无法提高性能。此外，apply/transform（在行中）将做基本相同的事情，调用 API [不。行数]次。取列，这些方法不会同时取两列。

Answer 2

来自 Pandas 文档：

You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

因此，首先，我将遍历索引并引用该数据框。（我没有写在你的 try-except 语句中，但如果需要的话添加它）。

我假设你的 API 做了这样的事情：

class cityInfoAPIDummy:
def __init__(self):
    self.data = pd.DataFrame(columns=["city","zip","population","country","foundingDate"],data=[["london","NE1","100000","UK","ages ago"],["birmingham","B","50000","UK","less long ago"]])
def download(self,city,z):
    return self.data[self.data["city"]==city][self.data["zip"]==z]

使用 apply() 函数执行您想执行的操作的代码如下所示：

def get_info(city,z,field):
    cityInfo=cityInfoAPIDummy()

    infos = cityInfo.download(city,z)

    if len(infos)==0:
        print("City not found")
        return ""
    else:
        return infos[field].values[0]

if __name__ == "__main__":
    # assuming some dummy data in the format you were after
    data = pd.DataFrame(columns=["city","zip"],data=[["london","NE1"],["birmingham","B"]])
    data["pop"] = data.apply( lambda x: get_info(x["city"],x["zip"],"population") ,axis=1)
    data["country"] = data.apply(lambda x: get_info(x["city"],x["zip"],"country"),axis=1)
    data["founding"] = data.apply(lambda x: get_info(x["city"],x["zip"],"foundingDate"),axis=1)

print(data)

我希望这对您有所帮助，但是如果 API 是自定义的 API，则很难知道它为您提供什么服务 API。

应用函数在 pandas 中创建多列

Apply function to create multiple columns in pandas

python

apply

pandas