如何将多个数据帧合并为一个并将其输出到 pandas 中的 csv 文件?
How to concat multiple dataframes to one and output it to a csv file in pandas?
我有一个如下所示的 csv 文件
,date,location,device,provider,cpu,mem,load,drops,id,latency,gw_latency,upload,download,sap_drops,sap_latency,alert_id
0,2018-02-10 11:52:59.342269+00:00,CFE,10.0.100.1,BWE,6.0,23.0,11.75,0.0,,,,,,,,
1,2018-02-10 11:53:04.006971+00:00,CDW,10.0.100.1,GRE,6.0,23.0,4.58,0.0,,,,,,,,
2,2018-02-09 11:52:59.342269+00:00,,,SSD,,,10.45,,,,,,,,,
3,2018-02-08 09:52:59.342269+00:00,,,BWE,,,12.45,,,,,,,,,
4,2018-02-07 04:52:59.342269+00:00,,,RRW,,,9.45,,,,,,,,,
5,2018-02-06 05:52:59.342269+00:00,,,GRE,,,5.45,,,,,,,,,
6,2018-02-05 07:52:59.342269+00:00,,,SSD,,,13.45,,,,,,,,,
7,2018-02-04 10:52:59.342269+00:00,,,SSD,,,8.15,,,,,,,,,
8,2018-02-03 10:52:59.342269+00:00,,,GRE,,,4.15,,,,,,,,,
9,2018-02-02 06:52:59.342269+00:00,,,RRW,,,13.15,,,,,,,,,
10,2018-02-10 22:35:33.438948+00:00,QQW,10.12.11.1,VCD,4.0,23.0,5.0,0.0,,,,,,,,
11,2018-02-10 22:35:37.905242+00:00,CSW,10.12.11.1,VCD,4.0,23.0,6.08,0.0,,,,,,,,
.......................................................................................
.......................................................................................
我像下面这样加载 csv 文件
df = pd.read_csv("metrics_copy.csv", parse_dates=["date"])
df['device'] = df['device'].astype(str)
unique_devices = (np.unique(df[['device']].values))
unique_provider = np.unique(df[['provider']].values)
我想要一个 csv 文件,其中仅包含特定组合的特定列。
for i in unique_devices:
for j in ["cpu", "mem"]:
df2 = df[(df['device'] == i)]
df2["date"] = pd.to_datetime(df2["date"], format="%Y-%m-%d")
print(df2[j])
如您所见,对于设备和指标的每个独特组合,我将得到一个时间序列 data.I 我能够为给定的 [=35] 获得 df2[j]
的一堆值=] 只要循环继续,就想将这些值输出到所有组合的 csv 文件中。我知道一个名为 pd.concat 的概念,可以像下面这样使用
df_final = pd.concat([df, df2, df3.....])
但为此我需要为所有可能的组合生成数据帧,然后最终将它们连接起来成为一个 dataframe.So 我希望最终结果 csv 文件看起来像下面的 cpu
date cpu
... ...
... ...
mem
的另一个 csv 文件如下所示
date mem
... ...
... ...
但是我不知道如何实现this.Any求助?
在追加模式下使用 df.to_csv() 改编自以下内容:How to add pandas data to an existing csv file?
for i in unique_devices:
for j in ["cpu", "mem"]:
df2 = df[(df['device'] == i)]
df2["date"] = pd.to_datetime(df2["date"], format="%Y-%m-%d")
df2[['date',j]].to_csv('{}.csv'.format(j), mode='a', index=False, header=False)
或者你可以有一个 if 语句来检查文件是否存在,所以第一次创建时,将使用 header,然后忽略它:
for i in unique_devices:
for j in ["cpu", "mem"]:
df2 = df[(df['device'] == i)]
df2["date"] = pd.to_datetime(df2["date"], format="%Y-%m-%d")
import os
if not os.path.isfile('{}.csv'.format(j)):
df.to_csv('{}.csv'.format(j), mode='a', index=False)
else:
df2[['date',j]].to_csv('{}.csv'.format(j), mode='a', index=False, header=False)
我有一个如下所示的 csv 文件
,date,location,device,provider,cpu,mem,load,drops,id,latency,gw_latency,upload,download,sap_drops,sap_latency,alert_id
0,2018-02-10 11:52:59.342269+00:00,CFE,10.0.100.1,BWE,6.0,23.0,11.75,0.0,,,,,,,,
1,2018-02-10 11:53:04.006971+00:00,CDW,10.0.100.1,GRE,6.0,23.0,4.58,0.0,,,,,,,,
2,2018-02-09 11:52:59.342269+00:00,,,SSD,,,10.45,,,,,,,,,
3,2018-02-08 09:52:59.342269+00:00,,,BWE,,,12.45,,,,,,,,,
4,2018-02-07 04:52:59.342269+00:00,,,RRW,,,9.45,,,,,,,,,
5,2018-02-06 05:52:59.342269+00:00,,,GRE,,,5.45,,,,,,,,,
6,2018-02-05 07:52:59.342269+00:00,,,SSD,,,13.45,,,,,,,,,
7,2018-02-04 10:52:59.342269+00:00,,,SSD,,,8.15,,,,,,,,,
8,2018-02-03 10:52:59.342269+00:00,,,GRE,,,4.15,,,,,,,,,
9,2018-02-02 06:52:59.342269+00:00,,,RRW,,,13.15,,,,,,,,,
10,2018-02-10 22:35:33.438948+00:00,QQW,10.12.11.1,VCD,4.0,23.0,5.0,0.0,,,,,,,,
11,2018-02-10 22:35:37.905242+00:00,CSW,10.12.11.1,VCD,4.0,23.0,6.08,0.0,,,,,,,,
.......................................................................................
.......................................................................................
我像下面这样加载 csv 文件
df = pd.read_csv("metrics_copy.csv", parse_dates=["date"])
df['device'] = df['device'].astype(str)
unique_devices = (np.unique(df[['device']].values))
unique_provider = np.unique(df[['provider']].values)
我想要一个 csv 文件,其中仅包含特定组合的特定列。
for i in unique_devices:
for j in ["cpu", "mem"]:
df2 = df[(df['device'] == i)]
df2["date"] = pd.to_datetime(df2["date"], format="%Y-%m-%d")
print(df2[j])
如您所见,对于设备和指标的每个独特组合,我将得到一个时间序列 data.I 我能够为给定的 [=35] 获得 df2[j]
的一堆值=] 只要循环继续,就想将这些值输出到所有组合的 csv 文件中。我知道一个名为 pd.concat 的概念,可以像下面这样使用
df_final = pd.concat([df, df2, df3.....])
但为此我需要为所有可能的组合生成数据帧,然后最终将它们连接起来成为一个 dataframe.So 我希望最终结果 csv 文件看起来像下面的 cpu
date cpu
... ...
... ...
mem
的另一个 csv 文件如下所示
date mem
... ...
... ...
但是我不知道如何实现this.Any求助?
在追加模式下使用 df.to_csv() 改编自以下内容:How to add pandas data to an existing csv file?
for i in unique_devices:
for j in ["cpu", "mem"]:
df2 = df[(df['device'] == i)]
df2["date"] = pd.to_datetime(df2["date"], format="%Y-%m-%d")
df2[['date',j]].to_csv('{}.csv'.format(j), mode='a', index=False, header=False)
或者你可以有一个 if 语句来检查文件是否存在,所以第一次创建时,将使用 header,然后忽略它:
for i in unique_devices:
for j in ["cpu", "mem"]:
df2 = df[(df['device'] == i)]
df2["date"] = pd.to_datetime(df2["date"], format="%Y-%m-%d")
import os
if not os.path.isfile('{}.csv'.format(j)):
df.to_csv('{}.csv'.format(j), mode='a', index=False)
else:
df2[['date',j]].to_csv('{}.csv'.format(j), mode='a', index=False, header=False)