Python - 存储循环中来自文件的平均值;然后在循环外找到全局平均值?
Python - Storing the average of from files in loop; and then finding global average outside of loop?
我下面有一个函数,它当前循环遍历以“K”和“Z”开头的文件,并绘制“Temp”数据;蓝色代表“K”数据,红色代表“Z”数据。这非常适合我的目标。
我卡在哪里:
- 我现在想为循环中的每个文件取样本 100 和样本 350 之间的“温度”平均值。
- 然后,我想将每个文件的平均值存储在一个新的 dataFrame 中,其中一列用于“K”平均值,一列用于“Z”平均值。
- 最后,在循环之外,我想取“K”列的平均值和“Z”列的平均值;并将其绘制在图表上。
在我下面的代码中,我在卡住的地方添加了注释。
作为附带问题,如果有人知道自动检测每个数据集的“平坦”区域(斜率 ~= 0)然后自动 select 平均间隔的好方法;那将是一件很酷的事情!因为现在,我肯定会通过设置固定间隔来丢失一些数据点。
filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")
def plot_data(filename, fig_ax, color):
df = pd.read_csv(f, sep=',',skiprows=24)
df.columns=['sample','Temp']
df=df.astype(str)
df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
# Now take the average of df["Temp"] from sample 100 until sample 350.
# Append this average to a K_Z_Averages, containing a column for average
# from each K file and the average from each Z file.
fig_ax.plot(df[["Temp"]], color=color)
fig, ax = plt.subplots()
for f in filenamesK:
plot_data(f, ax, 'blue')
for f in filenamesZ:
plot_data(f, ax, 'red')
# After the loop is finished, take the average of each column in K_Z_averages
# with each average from the K files and from the Z files.
plt.show()
第 2 部分:
如果我的 .csv 文件有第二个 Temp,“Temp2”,我想提取它,你能支持将它添加到 dict
中吗?例如,在 dict
中有一列用于 K_Temp、K_Temp2、Z_Temp、Z_Temp2?
我修改了我认为可行的代码,但我想有更有效的方法来做到这一点:
filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")
# Create dict of lists for storing the averages
K_Z_Averages = {'K':[], 'Z':[]}
def plot_data(filename, fig_ax, color):
df = pd.read_csv(f, sep=',',skiprows=24)
df.columns=['sample','Temp','Temp2']
df=df.astype(str)
df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
df["Temp2"] = df["Temp2"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
# Now take the average of df["Temp"] from sample 100 until sample 350.
avg_Temp1 = df.iloc[100-1:350+1]['Temp'].mean()
avg_Temp2 = df.iloc[100-1:350+1]['Temp2'].mean()
# Append this average to a K_Z_Averages, containing a column for average
# from each K file and the average from each Z file.
K_Z_Averages[filename.split('/')[-1][0]].append(avg_Temp1)
K_Z_Averages[filename.split('/')[-1][0]].append(avg_Temp2)
fig_ax.plot(df[["Temp"]], color=color)
fig, ax = plt.subplots()
for f in filenamesK:
plot_data(f, ax, 'blue')
for f in filenamesZ:
plot_data(f, ax, 'red')
# Take the overall average
df_avg = pd.DataFrame(K_Z_Averages).mean()
# Add vertical lines for each mean
ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','red'], alpha=.5)
plt.show()
您可以创建一个字典来存储每个文件的平均值,然后用它来追加平均值:
# Before the the `plot_data` definition
K_Z_Averages = {'K':[], 'Z':[]}
# Inside the function
avg = df.iloc[100-1:350+1]['Temp'].mean()
K_Z_Averages[filename.split('/')[-1][0]].append(avg)
其中 filename.split('/')[-1][0]
删除路径扩展名并取文件名的第一个字母(类似于使用 os.path.basename(filename)[0]
)。
然后,取平均值的总平均值:
pd.DataFrame(K_Z_Averages).mean()
完整代码应如下所示:
filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")
# Create dict of lists for storing the averages
K_Z_Averages = {'K':[], 'Z':[]}
def plot_data(filename, fig_ax, color):
df = pd.read_csv(f, sep=',',skiprows=24)
df.columns=['sample','Temp']
df=df.astype(str)
df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
# Now take the average of df["Temp"] from sample 100 until sample 350.
avg = df.iloc[100-1:350+1]['Temp'].mean()
# Append this average to a K_Z_Averages, containing a column for average
# from each K file and the average from each Z file.
K_Z_Averages[filename.split('/')[-1][0]].append(avg)
fig_ax.plot(df[["Temp"]], color=color)
fig, ax = plt.subplots()
for f in filenamesK:
plot_data(f, ax, 'blue')
for f in filenamesZ:
plot_data(f, ax, 'red')
# Take the overall average
df_avg = pd.DataFrame(K_Z_Averages).mean()
# Add vertical lines for each mean
ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','red'], alpha=.5)
plt.show()
问题编辑(第 2 部分)后,代码应如下所示:
import pandas as pd
from glob import glob
from os.path import basename
import matplotlib.pyplot as plt
filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")
# Create dict of lists for storing the averages
K_Z_Averages = {'K_Temp':[], 'K_Temp2': [], 'Z_Temp':[], 'Z_Temp2': []}
def plot_data(filename, fig_ax, color):
df = pd.read_csv(f, sep=',',skiprows=24)
df.columns=['sample','Temp','Temp2']
df=df.astype(str)
df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
df["Temp2"] = df["Temp2"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
# Now take the average of df["Temp"] from sample 100 until sample 350.
avg_Temp1 = df.iloc[100-1:350+1]['Temp'].mean()
avg_Temp2 = df.iloc[100-1:350+1]['Temp2'].mean()
# Append this average to a K_Z_Averages, containing a column for average
# from each K file and the average from each Z file.
K_Z_Averages[basename(filename)[0] + "_Temp"].append(avg_Temp1)
K_Z_Averages[basename(filename)[0] + "_Temp2"].append(avg_Temp2)
fig_ax.plot(df[["Temp"]], color=color)
fig_ax.plot(df[["Temp2"]], color=color)
fig, ax = plt.subplots()
for f in filenamesK:
plot_data(f, ax, 'blue')
plot_data(f, ax, 'darkblue')
for f in filenamesZ:
plot_data(f, ax, 'red')
plot_data(f, ax, 'darkred')
# Take the overall average
df_avg = pd.DataFrame(K_Z_Averages).mean()
# Add vertical lines for each mean
ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','darkblue','red','darkred'], alpha=.5)
plt.show()
我不确定我是否理解 'K_Z_Average' 问题的第二部分。但是这里是:
# Now take the average of df["Temp"] from sample 100 until sample 350.
average_temperature=df.iloc[100:350]['Temp'].mean()
# Append this average to a K_Z_Averages, containing a column for average
# from each K file and the average from each Z file.
df['K_Z_Average']=average_temparature
我下面有一个函数,它当前循环遍历以“K”和“Z”开头的文件,并绘制“Temp”数据;蓝色代表“K”数据,红色代表“Z”数据。这非常适合我的目标。
我卡在哪里:
- 我现在想为循环中的每个文件取样本 100 和样本 350 之间的“温度”平均值。
- 然后,我想将每个文件的平均值存储在一个新的 dataFrame 中,其中一列用于“K”平均值,一列用于“Z”平均值。
- 最后,在循环之外,我想取“K”列的平均值和“Z”列的平均值;并将其绘制在图表上。
在我下面的代码中,我在卡住的地方添加了注释。
作为附带问题,如果有人知道自动检测每个数据集的“平坦”区域(斜率 ~= 0)然后自动 select 平均间隔的好方法;那将是一件很酷的事情!因为现在,我肯定会通过设置固定间隔来丢失一些数据点。
filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")
def plot_data(filename, fig_ax, color):
df = pd.read_csv(f, sep=',',skiprows=24)
df.columns=['sample','Temp']
df=df.astype(str)
df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
# Now take the average of df["Temp"] from sample 100 until sample 350.
# Append this average to a K_Z_Averages, containing a column for average
# from each K file and the average from each Z file.
fig_ax.plot(df[["Temp"]], color=color)
fig, ax = plt.subplots()
for f in filenamesK:
plot_data(f, ax, 'blue')
for f in filenamesZ:
plot_data(f, ax, 'red')
# After the loop is finished, take the average of each column in K_Z_averages
# with each average from the K files and from the Z files.
plt.show()
第 2 部分:
如果我的 .csv 文件有第二个 Temp,“Temp2”,我想提取它,你能支持将它添加到 dict
中吗?例如,在 dict
中有一列用于 K_Temp、K_Temp2、Z_Temp、Z_Temp2?
我修改了我认为可行的代码,但我想有更有效的方法来做到这一点:
filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")
# Create dict of lists for storing the averages
K_Z_Averages = {'K':[], 'Z':[]}
def plot_data(filename, fig_ax, color):
df = pd.read_csv(f, sep=',',skiprows=24)
df.columns=['sample','Temp','Temp2']
df=df.astype(str)
df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
df["Temp2"] = df["Temp2"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
# Now take the average of df["Temp"] from sample 100 until sample 350.
avg_Temp1 = df.iloc[100-1:350+1]['Temp'].mean()
avg_Temp2 = df.iloc[100-1:350+1]['Temp2'].mean()
# Append this average to a K_Z_Averages, containing a column for average
# from each K file and the average from each Z file.
K_Z_Averages[filename.split('/')[-1][0]].append(avg_Temp1)
K_Z_Averages[filename.split('/')[-1][0]].append(avg_Temp2)
fig_ax.plot(df[["Temp"]], color=color)
fig, ax = plt.subplots()
for f in filenamesK:
plot_data(f, ax, 'blue')
for f in filenamesZ:
plot_data(f, ax, 'red')
# Take the overall average
df_avg = pd.DataFrame(K_Z_Averages).mean()
# Add vertical lines for each mean
ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','red'], alpha=.5)
plt.show()
您可以创建一个字典来存储每个文件的平均值,然后用它来追加平均值:
# Before the the `plot_data` definition
K_Z_Averages = {'K':[], 'Z':[]}
# Inside the function
avg = df.iloc[100-1:350+1]['Temp'].mean()
K_Z_Averages[filename.split('/')[-1][0]].append(avg)
其中 filename.split('/')[-1][0]
删除路径扩展名并取文件名的第一个字母(类似于使用 os.path.basename(filename)[0]
)。
然后,取平均值的总平均值:
pd.DataFrame(K_Z_Averages).mean()
完整代码应如下所示:
filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")
# Create dict of lists for storing the averages
K_Z_Averages = {'K':[], 'Z':[]}
def plot_data(filename, fig_ax, color):
df = pd.read_csv(f, sep=',',skiprows=24)
df.columns=['sample','Temp']
df=df.astype(str)
df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
# Now take the average of df["Temp"] from sample 100 until sample 350.
avg = df.iloc[100-1:350+1]['Temp'].mean()
# Append this average to a K_Z_Averages, containing a column for average
# from each K file and the average from each Z file.
K_Z_Averages[filename.split('/')[-1][0]].append(avg)
fig_ax.plot(df[["Temp"]], color=color)
fig, ax = plt.subplots()
for f in filenamesK:
plot_data(f, ax, 'blue')
for f in filenamesZ:
plot_data(f, ax, 'red')
# Take the overall average
df_avg = pd.DataFrame(K_Z_Averages).mean()
# Add vertical lines for each mean
ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','red'], alpha=.5)
plt.show()
问题编辑(第 2 部分)后,代码应如下所示:
import pandas as pd
from glob import glob
from os.path import basename
import matplotlib.pyplot as plt
filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")
# Create dict of lists for storing the averages
K_Z_Averages = {'K_Temp':[], 'K_Temp2': [], 'Z_Temp':[], 'Z_Temp2': []}
def plot_data(filename, fig_ax, color):
df = pd.read_csv(f, sep=',',skiprows=24)
df.columns=['sample','Temp','Temp2']
df=df.astype(str)
df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
df["Temp2"] = df["Temp2"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
# Now take the average of df["Temp"] from sample 100 until sample 350.
avg_Temp1 = df.iloc[100-1:350+1]['Temp'].mean()
avg_Temp2 = df.iloc[100-1:350+1]['Temp2'].mean()
# Append this average to a K_Z_Averages, containing a column for average
# from each K file and the average from each Z file.
K_Z_Averages[basename(filename)[0] + "_Temp"].append(avg_Temp1)
K_Z_Averages[basename(filename)[0] + "_Temp2"].append(avg_Temp2)
fig_ax.plot(df[["Temp"]], color=color)
fig_ax.plot(df[["Temp2"]], color=color)
fig, ax = plt.subplots()
for f in filenamesK:
plot_data(f, ax, 'blue')
plot_data(f, ax, 'darkblue')
for f in filenamesZ:
plot_data(f, ax, 'red')
plot_data(f, ax, 'darkred')
# Take the overall average
df_avg = pd.DataFrame(K_Z_Averages).mean()
# Add vertical lines for each mean
ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','darkblue','red','darkred'], alpha=.5)
plt.show()
我不确定我是否理解 'K_Z_Average' 问题的第二部分。但是这里是:
# Now take the average of df["Temp"] from sample 100 until sample 350.
average_temperature=df.iloc[100:350]['Temp'].mean()
# Append this average to a K_Z_Averages, containing a column for average
# from each K file and the average from each Z file.
df['K_Z_Average']=average_temparature