Python - 存储循环中来自文件的平均值;然后在循环外找到全局平均值?

Python - Storing the average of from files in loop; and then finding global average outside of loop?

我下面有一个函数,它当前循环遍历以“K”和“Z”开头的文件,并绘制“Temp”数据;蓝色代表“K”数据,红色代表“Z”数据。这非常适合我的目标。

我卡在哪里:

  1. 我现在想为循环中的每个文件取样本 100 和样本 350 之间的“温度”平均值。
  2. 然后,我想将每个文件的平均值存储在一个新的 dataFrame 中,其中一列用于“K”平均值,一列用于“Z”平均值。
  3. 最后,在循环之外,我想取“K”列的平均值和“Z”列的平均值;并将其绘制在图表上。

在我下面的代码中,我在卡住的地方添加了注释。

作为附带问题,如果有人知道自动检测每个数据集的“平坦”区域(斜率 ~= 0)然后自动 select 平均间隔的好方法;那将是一件很酷的事情!因为现在,我肯定会通过设置固定间隔来丢失一些数据点。

filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")

def plot_data(filename, fig_ax, color):
    df = pd.read_csv(f, sep=',',skiprows=24)
    df.columns=['sample','Temp']
    df=df.astype(str)

    df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
    
    # Now take the average of df["Temp"] from sample 100 until sample 350.
    
    # Append this average to a K_Z_Averages, containing a column for average 
    # from each K file and the average from each Z file.
    
    fig_ax.plot(df[["Temp"]], color=color)

fig, ax = plt.subplots()

for f in filenamesK:
    plot_data(f, ax, 'blue')

for f in filenamesZ:
    plot_data(f, ax, 'red')

# After the loop is finished, take the average of each column in K_Z_averages 
# with each average from the K files and from the Z files.    
    
plt.show()

第 2 部分: 如果我的 .csv 文件有第二个 Temp,“Temp2”,我想提取它,你能支持将它添加到 dict 中吗?例如,在 dict 中有一列用于 K_Temp、K_Temp2、Z_Temp、Z_Temp2

我修改了我认为可行的代码,但我想有更有效的方法来做到这一点:

filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")

# Create dict of lists for storing the averages
K_Z_Averages = {'K':[], 'Z':[]}

def plot_data(filename, fig_ax, color):
    df = pd.read_csv(f, sep=',',skiprows=24)
    df.columns=['sample','Temp','Temp2']
    df=df.astype(str)

    df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
    df["Temp2"] = df["Temp2"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
    
    # Now take the average of df["Temp"] from sample 100 until sample 350.
    avg_Temp1 = df.iloc[100-1:350+1]['Temp'].mean()
    avg_Temp2 = df.iloc[100-1:350+1]['Temp2'].mean()
    
    # Append this average to a K_Z_Averages, containing a column for average 
    # from each K file and the average from each Z file.
    K_Z_Averages[filename.split('/')[-1][0]].append(avg_Temp1)
    K_Z_Averages[filename.split('/')[-1][0]].append(avg_Temp2)
    
    fig_ax.plot(df[["Temp"]], color=color)

fig, ax = plt.subplots()

for f in filenamesK:
    plot_data(f, ax, 'blue')

for f in filenamesZ:
    plot_data(f, ax, 'red')

# Take the overall average 
df_avg = pd.DataFrame(K_Z_Averages).mean() 

# Add vertical lines for each mean
ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','red'], alpha=.5)

plt.show()

您可以创建一个字典来存储每个文件的平均值,然后用它来追加平均值:

# Before the the `plot_data` definition
K_Z_Averages = {'K':[], 'Z':[]}

# Inside the function
avg = df.iloc[100-1:350+1]['Temp'].mean()
K_Z_Averages[filename.split('/')[-1][0]].append(avg)

其中 filename.split('/')[-1][0] 删除路径扩展名并取文件名的第一个字母(类似于使用 os.path.basename(filename)[0])。

然后,取平均值的总平均值:

pd.DataFrame(K_Z_Averages).mean()

完整代码应如下所示:

filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")

# Create dict of lists for storing the averages
K_Z_Averages = {'K':[], 'Z':[]}

def plot_data(filename, fig_ax, color):
    df = pd.read_csv(f, sep=',',skiprows=24)
    df.columns=['sample','Temp']
    df=df.astype(str)

    df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
    
    # Now take the average of df["Temp"] from sample 100 until sample 350.
    avg = df.iloc[100-1:350+1]['Temp'].mean()
    
    # Append this average to a K_Z_Averages, containing a column for average 
    # from each K file and the average from each Z file.
    K_Z_Averages[filename.split('/')[-1][0]].append(avg)
    
    fig_ax.plot(df[["Temp"]], color=color)

fig, ax = plt.subplots()

for f in filenamesK:
    plot_data(f, ax, 'blue')

for f in filenamesZ:
    plot_data(f, ax, 'red')

# Take the overall average 
df_avg = pd.DataFrame(K_Z_Averages).mean() 

# Add vertical lines for each mean
ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','red'], alpha=.5)

plt.show()

问题编辑(第 2 部分)后,代码应如下所示:

import pandas as pd
from glob import glob
from os.path import basename
import matplotlib.pyplot as plt

filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")

# Create dict of lists for storing the averages
K_Z_Averages = {'K_Temp':[], 'K_Temp2': [], 'Z_Temp':[], 'Z_Temp2': []}

def plot_data(filename, fig_ax, color):
    df = pd.read_csv(f, sep=',',skiprows=24)
    df.columns=['sample','Temp','Temp2']
    df=df.astype(str)

    df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
    df["Temp2"] = df["Temp2"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
    
    # Now take the average of df["Temp"] from sample 100 until sample 350.
    avg_Temp1 = df.iloc[100-1:350+1]['Temp'].mean()
    avg_Temp2 = df.iloc[100-1:350+1]['Temp2'].mean()
    
    # Append this average to a K_Z_Averages, containing a column for average 
    # from each K file and the average from each Z file.
    K_Z_Averages[basename(filename)[0] + "_Temp"].append(avg_Temp1)
    K_Z_Averages[basename(filename)[0] + "_Temp2"].append(avg_Temp2)
    
    fig_ax.plot(df[["Temp"]], color=color)
    fig_ax.plot(df[["Temp2"]], color=color)

fig, ax = plt.subplots()

for f in filenamesK:
    plot_data(f, ax, 'blue')
    plot_data(f, ax, 'darkblue')

for f in filenamesZ:
    plot_data(f, ax, 'red')
    plot_data(f, ax, 'darkred')

# Take the overall average 
df_avg = pd.DataFrame(K_Z_Averages).mean() 

# Add vertical lines for each mean
ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','darkblue','red','darkred'], alpha=.5)

plt.show()

我不确定我是否理解 'K_Z_Average' 问题的第二部分。但是这里是:

    # Now take the average of df["Temp"] from sample 100 until sample 350.
    average_temperature=df.iloc[100:350]['Temp'].mean()
   
    # Append this average to a K_Z_Averages, containing a column for average 
    # from each K file and the average from each Z file.
    df['K_Z_Average']=average_temparature