Pandas 连接不同的索引
Pandas concat with different indices
我想连接三个数据框,但它们都有不同的索引。所有三个索引都具有相同的长度。我的第一个 df 是这样的:
Index Time_start Time_end duration value
0 5 10 5 1.0
1 10 16 6 NaN
...
39 50 53 3 NaN
第二个 df 如下所示:
Index Time_start Time_end duration value
40 5 10 5 2.0
42 10 16 6 NaN
...
79 50 53 3 NaN
第三个看起来完全一样,但索引 = [80..119]
但是time_start、Time_end和duration是完全一样的。值不同。
我想连接值列,使其看起来像这样
Index Time_start Time_end duration value1 value2 value3
1 5 10 5 1.0 2 3
2 10 16 6 NaN NaN NaN
...
39 50 53 3 NaN NaN NaN
到目前为止我试过这个
pd.concat([df1, df2.value, ms3.value], axis=1, join_axes = [df1.index])
但是索引不一样,所以没用。我知道我可以先尝试
df2.reset_index(drop=True)
然后进行 concat,这很有效,但我相信还有更好的方法。
dfs = [df1, df2]
cols = ['Time_start', 'Time_end', 'duration']
keys = ['value1', 'value2']
pd.concat(
[df.set_index(cols).value for df in dfs],
axis=1, keys=keys)
value1 value2
Time_start Time_end duration
5 10 5 1.0 2.0
10 16 6 NaN NaN
50 53 3 NaN NaN
使用:
dfs = [df1,df2]
k = ['value1','value2']
df = pd.concat([x.set_index(['Time_start','Time_end','duration']) for x in dfs],
axis=1,keys=k)
df.columns = df.columns.droplevel(-1)
print (df)
value1 value2
Time_start Time_end duration
5 10 5 1.0 2.0
10 16 6 NaN NaN
50 53 3 NaN NaN
另一个解决方案:
dfs = [df1,df2]
df = pd.concat([x.set_index(['Time_start','Time_end','duration']) for x in dfs],axis=1)
df.columns = [x + str(i+1) for i, x in enumerate(df.columns)]
print (df)
value1 value2
Time_start Time_end duration
5 10 5 1.0 2.0
10 16 6 NaN NaN
50 53 3 NaN NaN
我想连接三个数据框,但它们都有不同的索引。所有三个索引都具有相同的长度。我的第一个 df 是这样的:
Index Time_start Time_end duration value
0 5 10 5 1.0
1 10 16 6 NaN
...
39 50 53 3 NaN
第二个 df 如下所示:
Index Time_start Time_end duration value
40 5 10 5 2.0
42 10 16 6 NaN
...
79 50 53 3 NaN
第三个看起来完全一样,但索引 = [80..119] 但是time_start、Time_end和duration是完全一样的。值不同。
我想连接值列,使其看起来像这样
Index Time_start Time_end duration value1 value2 value3
1 5 10 5 1.0 2 3
2 10 16 6 NaN NaN NaN
...
39 50 53 3 NaN NaN NaN
到目前为止我试过这个
pd.concat([df1, df2.value, ms3.value], axis=1, join_axes = [df1.index])
但是索引不一样,所以没用。我知道我可以先尝试
df2.reset_index(drop=True)
然后进行 concat,这很有效,但我相信还有更好的方法。
dfs = [df1, df2]
cols = ['Time_start', 'Time_end', 'duration']
keys = ['value1', 'value2']
pd.concat(
[df.set_index(cols).value for df in dfs],
axis=1, keys=keys)
value1 value2
Time_start Time_end duration
5 10 5 1.0 2.0
10 16 6 NaN NaN
50 53 3 NaN NaN
使用:
dfs = [df1,df2]
k = ['value1','value2']
df = pd.concat([x.set_index(['Time_start','Time_end','duration']) for x in dfs],
axis=1,keys=k)
df.columns = df.columns.droplevel(-1)
print (df)
value1 value2
Time_start Time_end duration
5 10 5 1.0 2.0
10 16 6 NaN NaN
50 53 3 NaN NaN
另一个解决方案:
dfs = [df1,df2]
df = pd.concat([x.set_index(['Time_start','Time_end','duration']) for x in dfs],axis=1)
df.columns = [x + str(i+1) for i, x in enumerate(df.columns)]
print (df)
value1 value2
Time_start Time_end duration
5 10 5 1.0 2.0
10 16 6 NaN NaN
50 53 3 NaN NaN