数据框列的均值和标准差
Mean & std on data frame column
对于 50 个 csv 文件,我只需要计算一个特定列的均值和标准差。然后我需要创建一个包含 50 行和 2 列的数组,以便在每一行中包含一个 csv 文件的标准差和平均值。
我陷入了试图获得第一个 csv 的 df 的均值和标准的水平。
这是我得到的:
import numpy as np
import pandas as pd
import glob
i=0
path ="C:\Users\sharon\Desktop\mathematical finance\sadna"
all_files = glob.glob(path + "/*.csv")
arr = np.zeros((50,2))
for filename in all_files:
df = pd.read_csv(filename,encoding="utf-8")
df=df.loc[2:470,'Unnamed: 3']
Mean=df.mean() #DOES NOT WORK
Std=df.std(axis=1) # What?...
arr[i,:]=(Mean,Std)
编辑:
使用此代码解决的问题:
import numpy as np
import pandas as pd
import glob
path ="C:\Users\sharon\Desktop\mathematical finance\sadna"
all_files = glob.glob(path + "/*.csv")
df_list = [(pd.read_csv(f, encoding = "utf-8", header = None,usecols = [3], nrows = 470, ).assign(filename = f)) for f in all_files]
final_df = pd.concat(df_list)
final_df[3]= final_df[3].apply(pd.to_numeric, errors='coerce')
agg_df = final_df.groupby(['filename']).agg(['mean', 'std'])
考虑使用 list comprehension across all CSV files that are concatenated together with concat
. Be sure to use needed arguments of read_csv
. Then aggregate for needed statistics. Finally, convert data frame values to numpy array with to_numpy
:
构建单个数据框
path = "C:\Users\sharon\Desktop\mathematical finance\sadna"
all_files = glob.glob(path + "/*.csv")
# SPECIFY NO HEADERS, SPECIFIC COLUMN AND NUMBER OF ROWS
df_list = [(pd.read_csv(f, encoding = "utf-8",
header = None,
usecols = [4],
nrows = 469)
.assign(filename = f)
) for f in all_files]
# COMPILE LARGE DATA FRAME
final_df = pd.concat(df_list, ignore_index=True)
# AGGREGATE BY filename
agg_df = final_df.groupby(['filename']).agg(['mean', 'std'])
# CONVERT TO NUMPY ARRAY
arr = agg_df.to_numpy()
对于 50 个 csv 文件,我只需要计算一个特定列的均值和标准差。然后我需要创建一个包含 50 行和 2 列的数组,以便在每一行中包含一个 csv 文件的标准差和平均值。 我陷入了试图获得第一个 csv 的 df 的均值和标准的水平。
这是我得到的:
import numpy as np
import pandas as pd
import glob
i=0
path ="C:\Users\sharon\Desktop\mathematical finance\sadna"
all_files = glob.glob(path + "/*.csv")
arr = np.zeros((50,2))
for filename in all_files:
df = pd.read_csv(filename,encoding="utf-8")
df=df.loc[2:470,'Unnamed: 3']
Mean=df.mean() #DOES NOT WORK
Std=df.std(axis=1) # What?...
arr[i,:]=(Mean,Std)
编辑:
使用此代码解决的问题:
import numpy as np
import pandas as pd
import glob
path ="C:\Users\sharon\Desktop\mathematical finance\sadna"
all_files = glob.glob(path + "/*.csv")
df_list = [(pd.read_csv(f, encoding = "utf-8", header = None,usecols = [3], nrows = 470, ).assign(filename = f)) for f in all_files]
final_df = pd.concat(df_list)
final_df[3]= final_df[3].apply(pd.to_numeric, errors='coerce')
agg_df = final_df.groupby(['filename']).agg(['mean', 'std'])
考虑使用 list comprehension across all CSV files that are concatenated together with concat
. Be sure to use needed arguments of read_csv
. Then aggregate for needed statistics. Finally, convert data frame values to numpy array with to_numpy
:
path = "C:\Users\sharon\Desktop\mathematical finance\sadna"
all_files = glob.glob(path + "/*.csv")
# SPECIFY NO HEADERS, SPECIFIC COLUMN AND NUMBER OF ROWS
df_list = [(pd.read_csv(f, encoding = "utf-8",
header = None,
usecols = [4],
nrows = 469)
.assign(filename = f)
) for f in all_files]
# COMPILE LARGE DATA FRAME
final_df = pd.concat(df_list, ignore_index=True)
# AGGREGATE BY filename
agg_df = final_df.groupby(['filename']).agg(['mean', 'std'])
# CONVERT TO NUMPY ARRAY
arr = agg_df.to_numpy()