基于具有列表值的两列获取相关列
Getting A Correlation Column Based on Two Columns with A List Value
我有以下数据集:
df = pd.DataFrame({'A': [[10, 11, 12], [13, 14, 15]],
'B': [[17, 18, 12], [21, 22, 13]]})
df
A B
0 [10, 11, 12] [17, 18, 12]
1 [13, 14, 15] [21, 22, 13]
现在我想使用 scipy.stats.pearsonr
方法基于 A
和 B
列创建一个新列 Correlation
。我正在尝试这个:
# Creating a function for correlation
def correlation(row):
correlation, p_value = stats.pearsonr(row['A'], row['B'])
return correlation
# Applying the function
df['Correlation'] = df.apply(correlation, axis = 1)
df
A B Correlation
0 [10, 11, 12] [17, 18, 12] -0.777714
1 [13, 14, 15] [21, 22, 13] -0.810885
如果我有太多列,上面的脚本就不是理想的了。我在想是否可以直接在lambda
中使用stats.pearsonr
得到相同的结果?
如有任何建议,我们将不胜感激。谢谢!
我会推荐使用 zip
和 for 循环
df['out'] = [stats.pearsonr(x, y)[0] for x, y in zip(df.A, df.B)]
df
Out[163]:
A B out
0 [10, 11, 12] [17, 18, 12] -0.777714
1 [13, 14, 15] [21, 22, 13] -0.810885
我有以下数据集:
df = pd.DataFrame({'A': [[10, 11, 12], [13, 14, 15]],
'B': [[17, 18, 12], [21, 22, 13]]})
df
A B
0 [10, 11, 12] [17, 18, 12]
1 [13, 14, 15] [21, 22, 13]
现在我想使用 scipy.stats.pearsonr
方法基于 A
和 B
列创建一个新列 Correlation
。我正在尝试这个:
# Creating a function for correlation
def correlation(row):
correlation, p_value = stats.pearsonr(row['A'], row['B'])
return correlation
# Applying the function
df['Correlation'] = df.apply(correlation, axis = 1)
df
A B Correlation
0 [10, 11, 12] [17, 18, 12] -0.777714
1 [13, 14, 15] [21, 22, 13] -0.810885
如果我有太多列,上面的脚本就不是理想的了。我在想是否可以直接在lambda
中使用stats.pearsonr
得到相同的结果?
如有任何建议,我们将不胜感激。谢谢!
我会推荐使用 zip
和 for 循环
df['out'] = [stats.pearsonr(x, y)[0] for x, y in zip(df.A, df.B)]
df
Out[163]:
A B out
0 [10, 11, 12] [17, 18, 12] -0.777714
1 [13, 14, 15] [21, 22, 13] -0.810885