相关矩阵:提取具有高 R 值的变量
Correlation Matrix: Extract Variables with High R Values
如何获得仅列出绝对值相关性大于 .7 的变量的输出?
我想要类似这样的输出:
four: one, three
one: three
感谢您的宝贵时间!
代码
import pandas as pd
x={'one':[1,2,3,4],'two':[3,5,7,5],'three':[2,3,4,9],'four':[4,3,1,0],}
y=pd.DataFrame(x)
print(y.corr())
输出
four one three two
four 1.000000 -0.989949 -0.880830 -0.670820
one -0.989949 1.000000 0.913500 0.632456
three -0.880830 0.913500 1.000000 0.262613
two -0.670820 0.632456 0.262613 1.000000
如果您只想将其打印出来,这将有效:
col_names = y.corr().columns.values
for col, row in (y.corr().abs() > 0.7).iteritems():
print(col, col_names[row.values])
请注意,这可行,但可能会很慢,因为 iteritems
方法会将每一行转换为一个系列。
这对我有用:
corr = y.corr().unstack().reset_index() #group together pairwise
corr.columns = ['var1','var2','corr'] #rename columns to something readable
print( corr[ corr['corr'].abs() > 0.7 ] ) #keep correlation results above 0.7
您可以通过将最后一行更改为
来进一步排除具有相同名称 (corr = 1) 的变量
print( corr[ (corr['corr'].abs() > 0.7) & (corr['var1'] != corr['var2']) ] )
如何获得仅列出绝对值相关性大于 .7 的变量的输出?
我想要类似这样的输出:
four: one, three
one: three
感谢您的宝贵时间!
代码
import pandas as pd
x={'one':[1,2,3,4],'two':[3,5,7,5],'three':[2,3,4,9],'four':[4,3,1,0],}
y=pd.DataFrame(x)
print(y.corr())
输出
four one three two
four 1.000000 -0.989949 -0.880830 -0.670820
one -0.989949 1.000000 0.913500 0.632456
three -0.880830 0.913500 1.000000 0.262613
two -0.670820 0.632456 0.262613 1.000000
如果您只想将其打印出来,这将有效:
col_names = y.corr().columns.values
for col, row in (y.corr().abs() > 0.7).iteritems():
print(col, col_names[row.values])
请注意,这可行,但可能会很慢,因为 iteritems
方法会将每一行转换为一个系列。
这对我有用:
corr = y.corr().unstack().reset_index() #group together pairwise
corr.columns = ['var1','var2','corr'] #rename columns to something readable
print( corr[ corr['corr'].abs() > 0.7 ] ) #keep correlation results above 0.7
您可以通过将最后一行更改为
来进一步排除具有相同名称 (corr = 1) 的变量print( corr[ (corr['corr'].abs() > 0.7) & (corr['var1'] != corr['var2']) ] )