如何访问系列存储的单个元组值？

Question

我有一个数据框，每个单元格中都包含一个元组。

import pandas as pd
inp = [[(11,110), (12,120)], 
       [(13,130), (14,140), (15,150)]]
df = pd.DataFrame(inp)

for index, row in df.iterrows():
    print(row)

我希望以行迭代的方式访问每个元素。如您所见，iterrows() returns 以行的方式排列一系列元组，但不是它的单个值。例如，它给了我 (11, 110) ... (15, 150)。我想把它们拆分成一个整数。

期望的结果应该让我以行方式按索引访问元组的单个值。例如，在行迭代中，我可以从 index[0] 得到 11、12、13、14、15，而从 index[1]

得到 110、120、130、140、150

是否可以在 iterrows() 中这样做？

提前致谢！

Answer 1

首先，万不得已才使用DataFrame.iterrows()。 DataFrame 针对一次对整个列进行矢量化操作进行了优化，而不是针对逐行操作进行了优化。如果您必须迭代，请考虑使用 DataFrame.itertuples()，因为它保留了每列的数据类型并且运行速度快得多。

其次，在 Pandas（以及所有计算领域）中，为手头的任务适当地构建数据非常重要。您当前的解决方案将索引和时间点上的人员作为列。正如您的示例所示，这会形成一个宽而参差不齐的矩阵，其中可能包含许多 NaN。听起来您想为 DataFrame 的每个单元格存储四个数据元素：person、time、x 和 y。考虑每个时间点使用四列而不是一列，如下所示：

import pandas as pd
inp = [[(11,110), (12,120)], 
       [(13,130), (14,140), (15,150)]]
df = pd.DataFrame(inp)  # ragged and wide--not ideal for Pandas

df2 = df.stack()  # now each element is indexed by a MultiIndex (person and time).
df2.index.rename(["person", "time"], inplace=True)  # to be explicit

df3 = pd.DataFrame(df2.tolist(), index=df2.index)  # now each row is a person/time and there are two columns for x and y
df3.reset_index(inplace=True)  # not strictly necessary
df3.rename(columns={0: "x", 1: "y"}, inplace=True)  # to be explicit

for row in df3.itertuples():  # using itertuples instead of iterrows
    print(row)
# Pandas(Index=0, person=0, time=0, x=11, y=110)
# Pandas(Index=1, person=0, time=1, x=12, y=120)
# Pandas(Index=2, person=1, time=0, x=13, y=130)
# Pandas(Index=3, person=1, time=1, x=14, y=140)
# Pandas(Index=4, person=1, time=2, x=15, y=150)

你应该看看我是如何拆分元组的。当然，如果您有能力控制数据的构建方式，则无需进行此类操作——只需首先创建具有适当结构的 DataFrame。

现在您可以将 df3["x"] 和 df3["y"] 当作 pandas.Series 对象来处理您需要做的任何事情：

for x in df3["x"]:
    print(x)
# 11
# 12
# 13
# 14
# 15

for y in df3["y"]:
    print(y)
# 110
# 120
# 130
# 140
# 150

print(df3["x"] * df3["y"]/5 + 1)
# 0    243.0
# 1    289.0
# 2    339.0
# 3    393.0
# 4    451.0
# dtype: float64

如何访问系列存储的单个元组值？

How to access individual tuple value stored in series?

python

tuples

series

dataframe