在同一数组中查找相对于其温度值的最大半衰期值

Find max halflife values relative to their temperature value in the same array

基本上我在此处将 excel 文件加载到 pandas 数据帧中:

dv = pd.read_excel('data.xlsx')

然后我清理它并将其重命名为 "cleaned" 这对于这个可重现的示例并不重要,只是为了清楚起见提及:

if (selected_x.title()=="Viscosity" or selected_y.title()=="Viscosity"):
    cleaned = cleaned[cleaned.Study != "Yanqing Wang 2017"]
    cleaned = cleaned[cleaned.Study != "Thakore 2020"]

从那里,我将清理后的数据框分成单独的研究,这个项目是文学作品。我将在下面包括两个示例:

yan = cleaned[cleaned.Study == "Yanqing Wang 2017"]
tha = cleaned[cleaned.Study == "Thakore 2020"]

最后,我将每个单独的研究加载到轨迹中,并将它们显示在图表中。 selected y和selected x都是字符串,比如"Temperature (C) "和"Halflife (Min)":

trace1 = go.Scatter(y=tha[selected_y], x=tha[selected_x])
trace2 = go.Scatter(y=yan[selected_y], x=yan[selected_x])

我需要做的是,将数组拆分成单独的研究后,找到相对于每个温度的最大半衰期 (0,50,100,150,200,250,300) 并将它们编译成单独的列表,然后找到这些列表的最大值,取整行并将它们附加到同一个列表中。我尝试使用类似的东西来做到这一点:

yan50 = yanq[yanq['Temperature (C) '] == 50]
yan100 = yanq[yanq['Temperature (C) '] == 100]
yan150 = yanq[yanq['Temperature (C) '] == 150]
yan200 = yanq[yanq['Temperature (C) '] == 200]
yan250 = yanq[yanq['Temperature (C) '] == 250]
yan300 = yanq[yanq['Temperature (C) '] == 300]

将研究分成不同程度的列表。我目前被困在必须在每个列表的半衰期列中找到最大值并将整个相应行添加到新列表中的位置。这就是我正在尝试的:

yan = pd.DataFrame(columns=["Study","Gas","Surfactant","Surfactant Concentration","Additive","Additive Concentration","LiquidPhase","Quality","Pressure (Psi)","Temperature (C) ","Shear Rate (/Sec)","Halflife (Min)","Viscosity","Color"])

if (len(yan50) > 0):
    yan50.loc[yan50['Halflife (Min)'].idxmax()]
    yan50 = yan50.dropna()
    yan.append(yan50)

if (len(yan100) > 0):
    yan100.loc[yan100['Halflife (Min)'].idxmax()]
    yan100 = yan100.dropna()
    yan.append(yan100)

if (len(yan150) > 0):
    yan150.loc[yan150['Halflife (Min)'].idxmax()]
    yan150 = yan150.dropna()
    yan.append(yan150)

if (len(yan200) > 0):
    yan200.loc[yan200['Halflife (Min)'].idxmax()]
    yan200 = yan200.dropna()
    yan.append(yan200)

if (len(yan250) > 0):
    yan250.loc[yan250['Halflife (Min)'].idxmax()]
    yan250 = yan250.dropna()
    yan.append(yan250)

if (len(yan300) > 0):
    yan300.loc[yan300['Halflife (Min)'].idxmax()]
    yan300 = yan300.dropna()
    yan.append(yan300)yan50.iloc[yan50['Halflife (Min)'].idxmax()]

我得到的错误是各个温度列表是空的。

我还为我编译的单独温度列表得到了一堆 Nan 值,我不确定我是否正确地拆分了列表。我对 Pandas 不是太强。需要推荐!

Link to CSV of data

------------编辑------------

我所拥有的,所有研究都放在相同的温度点(50、100 等)上。我想找到半衰期的最大值,以便只显示最高点。我这样做的原因是为了帮助数据可视化。超出该主题的未来计划包括:用线连接最大值点并比较单独研究半衰期值的趋势。

IIUC,你需要的是

df2 = df.groupby(['Study','Temperature (C) '])['Halflife (Min)'].max().reset_index(name='Max_halflife')

这将导致

          Study     Temperature (C)     Max_halflife
0   Thakore 2020                 50     120.00
1   Thakore 2020                100     2.40
2   Thakore 2020                150     0.20
3   Yanqing Wang 2017            50     123.00
4   Yanqing Wang 2017           100     3.20
5   Yanqing Wang 2017           150     0.31

那么下面的代码应该可以得到你想要的图表。

import seaborn as sns
df2 = df.groupby(['Study','Temperature (C) '])['Halflife (Min)'].max().reset_index(name='Max_halflife')

fig = plt.figure(figsize=(8, 5))
sns.scatterplot(x='Temperature (C) ', y='Max_halflife', data=df2, hue='Study')