连接两个 1 列 DataFrames 不会 return 两列
Concatenate two 1 column DataFrames doesn't return both columns
我正在使用 Python 3.6,我是新手,所以提前感谢您的耐心等待。
我有一个函数可以计算 3 个点之间的差值。然后它应该采用 'differences' 并将它们与另一个名为标签的 DataFrame 连接起来。 k 和 length 是整数。我希望生成的 DataFrame 有两列,但它只有一列。
示例代码:
def distance(df1,df2,labels,k,length):
total_dist = 0
for i in range(length):
dist_dif = df1.iloc[:,i] - df2.iloc[:,i]
sq_dist = dist_dif ** 2
root_dist = sq_dist ** 0.5
total_dist = total_dist + root_dist
return total_dist
distance_df = pd.concat([total_dist, labels], axis=1)
distance_df.sort(ascending=False, axis=1, inplace=True)
top_knn = distance_df[:k]
return top_knn.value_counts().index.values[0]
示例数据:
d1 = {'Z_Norm_Age': [1.20, 2.58,2.54], 'Pclass': [3, 3, 2], 'Conv_Sex': [0, 1, 0]}
d2 = {'Z_Norm_Age': [-0.51, 0.24,0.67], 'Pclass': [3, 1, 3], 'Conv_Sex': [0, 1, 1]}
lbl = {'Survived': [0, 1,1]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
labels = pd.DataFrame(data=lbl)
我希望数据看起来像这样:
total_dist labels
0 1.715349 0
1 2.872991 1
2 4.344087 1
但它看起来像这样:
0 1.715349
1 4.344087
2 2.872991
dtype: float64
输出不执行以下操作:
1. Return标签列数据
2.将数据降序排列
如果有人能指出正确的方向,我将不胜感激。
给定两个 DataFrame
,df1-df2
将执行减法 element-wise。使用 abs()
取该差的绝对值,最后对每一行求和。这是对以下函数中第一个命令的解释。其他行与您的代码类似。
import numpy as np
import pandas as pd
def calc_abs_distance_between_rows_then_add_labels_and_sort(df1, df2, labels):
diff = np.sum(np.abs(df1-df2), axis=1) # np.sum(..., axis=1) sums the rows
diff.name = 'total_abs_distance' # Not really necessary, but just to refer to it later
diff = pd.concat([diff, labels], axis=1)
diff.sort_values(by='total_abs_distance', axis=0, ascending=True, inplace=True)
return diff
因此对于您的示例数据:
d1 = {'Z_Norm_Age': [1.20, 2.58,2.54], 'Pclass': [3, 3, 2], 'Conv_Sex': [0, 1, 0]}
d2 = {'Z_Norm_Age': [-0.51, 0.24,0.67], 'Pclass': [3, 1, 3], 'Conv_Sex': [0, 1, 1]}
lbl = {'Survived': ['a', 'b', 'c']}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
labels = pd.DataFrame(data=lbl)
calc_abs_distance_between_rows_then_add_labels_and_sort(df1, df2, labels)
我们希望得到您想要的:
total_abs_distance Survived
0 1.71 a
2 3.87 c
1 4.34 b
一些注意事项:
- 你真的想要 L1 范数吗?如果您想要 L2 范数(欧几里德距离),则将上面该函数中的第一个命令替换为
np.sqrt(np.sum(np.square(df1-df2),axis=1))
.
- 这些标签的用途是什么?考虑改用
DataFrame
的 index
。也许它会更适合您的目的?例如:
# lbl_series = pd.Series(['a','b','c'], name='Survived') # Try this later instead of lbl_list, to further explore the wonders of Pandas indexes :)
lbl_list = ['a', 'b', 'c']
df1.index = lbl_list
df2.index = lbl_list
# Then the L1-norm is simply this:
np.sum(np.abs(df1 - df2), axis=1).sort_values()
# Whose output is the Series: (with the labels as its index)
a 1.71
c 3.87
b 4.34
dtype: float64
我正在使用 Python 3.6,我是新手,所以提前感谢您的耐心等待。
我有一个函数可以计算 3 个点之间的差值。然后它应该采用 'differences' 并将它们与另一个名为标签的 DataFrame 连接起来。 k 和 length 是整数。我希望生成的 DataFrame 有两列,但它只有一列。
示例代码:
def distance(df1,df2,labels,k,length):
total_dist = 0
for i in range(length):
dist_dif = df1.iloc[:,i] - df2.iloc[:,i]
sq_dist = dist_dif ** 2
root_dist = sq_dist ** 0.5
total_dist = total_dist + root_dist
return total_dist
distance_df = pd.concat([total_dist, labels], axis=1)
distance_df.sort(ascending=False, axis=1, inplace=True)
top_knn = distance_df[:k]
return top_knn.value_counts().index.values[0]
示例数据:
d1 = {'Z_Norm_Age': [1.20, 2.58,2.54], 'Pclass': [3, 3, 2], 'Conv_Sex': [0, 1, 0]}
d2 = {'Z_Norm_Age': [-0.51, 0.24,0.67], 'Pclass': [3, 1, 3], 'Conv_Sex': [0, 1, 1]}
lbl = {'Survived': [0, 1,1]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
labels = pd.DataFrame(data=lbl)
我希望数据看起来像这样:
total_dist labels
0 1.715349 0
1 2.872991 1
2 4.344087 1
但它看起来像这样:
0 1.715349
1 4.344087
2 2.872991
dtype: float64
输出不执行以下操作: 1. Return标签列数据 2.将数据降序排列
如果有人能指出正确的方向,我将不胜感激。
给定两个 DataFrame
,df1-df2
将执行减法 element-wise。使用 abs()
取该差的绝对值,最后对每一行求和。这是对以下函数中第一个命令的解释。其他行与您的代码类似。
import numpy as np
import pandas as pd
def calc_abs_distance_between_rows_then_add_labels_and_sort(df1, df2, labels):
diff = np.sum(np.abs(df1-df2), axis=1) # np.sum(..., axis=1) sums the rows
diff.name = 'total_abs_distance' # Not really necessary, but just to refer to it later
diff = pd.concat([diff, labels], axis=1)
diff.sort_values(by='total_abs_distance', axis=0, ascending=True, inplace=True)
return diff
因此对于您的示例数据:
d1 = {'Z_Norm_Age': [1.20, 2.58,2.54], 'Pclass': [3, 3, 2], 'Conv_Sex': [0, 1, 0]}
d2 = {'Z_Norm_Age': [-0.51, 0.24,0.67], 'Pclass': [3, 1, 3], 'Conv_Sex': [0, 1, 1]}
lbl = {'Survived': ['a', 'b', 'c']}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
labels = pd.DataFrame(data=lbl)
calc_abs_distance_between_rows_then_add_labels_and_sort(df1, df2, labels)
我们希望得到您想要的:
total_abs_distance Survived
0 1.71 a
2 3.87 c
1 4.34 b
一些注意事项:
- 你真的想要 L1 范数吗?如果您想要 L2 范数(欧几里德距离),则将上面该函数中的第一个命令替换为
np.sqrt(np.sum(np.square(df1-df2),axis=1))
. - 这些标签的用途是什么?考虑改用
DataFrame
的index
。也许它会更适合您的目的?例如:
# lbl_series = pd.Series(['a','b','c'], name='Survived') # Try this later instead of lbl_list, to further explore the wonders of Pandas indexes :)
lbl_list = ['a', 'b', 'c']
df1.index = lbl_list
df2.index = lbl_list
# Then the L1-norm is simply this:
np.sum(np.abs(df1 - df2), axis=1).sort_values()
# Whose output is the Series: (with the labels as its index)
a 1.71
c 3.87
b 4.34
dtype: float64