使用 python 和 seaborn 从数据框生成热图
generate a heatmap from a dataframe with python and seaborn
我是 Python 的新手,也是 seaborn 的新手。
我有一个名为 df 的 pandas 数据框,它看起来像:
TIMESTAMP ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F2 ACT_TIME_AERATEUR_1_F3 ACT_TIME_AERATEUR_1_F4 ACT_TIME_AERATEUR_1_F5 ACT_TIME_AERATEUR_1_F6
2015-08-01 23:00:00 80 0 0 0 10 0
2015-08-01 23:20:00 60 0 20 0 10 10
2015-08-01 23:40:00 80 10 0 0 10 10
2015-08-01 00:00:00 60 10 20 40 10 10
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38840 entries, 0 to 38839
Data columns (total 7 columns):
TIMESTAMP 38840 non-null datetime64[ns]
ACT_TIME_AERATEUR_1_F1 38696 non-null float64
ACT_TIME_AERATEUR_1_F3 38697 non-null float64
ACT_TIME_AERATEUR_1_F5 38695 non-null float64
ACT_TIME_AERATEUR_1_F6 38695 non-null float64
ACT_TIME_AERATEUR_1_F7 38693 non-null float64
ACT_TIME_AERATEUR_1_F8 38696 non-null float64
dtypes: datetime64[ns](1), float64(6)
memory usage: 2.1 MB
我尝试使用此代码制作热图:
data = sns.load_dataset("df")
# Draw a heatmap with the numeric values in each cell
sns.heatmap(data, annot=True, fmt="d", linewidths=.5)
但是不行
你能帮我找出错误吗?
谢谢
编辑
第一的 ,
我从 csv 文件加载数据框:
df1 = pd.read_csv('C:/Users/Demonstrator/Downloads/Listeequipement.csv',delimiter=';', parse_dates=[0], infer_datetime_format = True)
然后,我 select 只有日期为 '2015-08-01 23:10:00' 和 '2015-08-02 00:00:00'
的行
import seaborn as sns
df1['TIMESTAMP']= pd.to_datetime(df1_no_missing['TIMESTAMP'], '%d-%m-%y %H:%M:%S')
df1['date'] = df_no_missing['TIMESTAMP'].dt.date
df1['time'] = df_no_missing['TIMESTAMP'].dt.time
date_debut = pd.to_datetime('2015-08-01 23:10:00')
date_fin = pd.to_datetime('2015-08-02 00:00:00')
df1 = df1[(df1['TIMESTAMP'] >= date_debut) & (df1['TIMESTAMP'] < date_fin)]
Then, construct the heatmap :
sns.heatmap(df1.iloc[:,2:],annot=True, fmt="d", linewidths=.5)
我收到这个错误:
TypeError Traceback (most recent call last)
<ipython-input-363-a054889ebec3> in <module>()
7 df1 = df1[(df1['TIMESTAMP'] >= date_debut) & (df1['TIMESTAMP'] < date_fin)]
8
----> 9 sns.heatmap(df1.iloc[:,2:],annot=True, fmt="d", linewidths=.5)
C:\Users\Demonstrator\Anaconda3\lib\site-packages\seaborn\matrix.py in
heatmap(data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws,
linewidths, linecolor, cbar, cbar_kws, cbar_ax, square, ax,
xticklabels, yticklabels, mask, **kwargs)
483 plotter = _HeatMapper(data, vmin, vmax, cmap, center, robust, annot, fmt,
484 annot_kws, cbar, cbar_kws, xticklabels,
--> 485 yticklabels, mask)
486
487 # Add the pcolormesh kwargs here
C:\Users\Demonstrator\Anaconda3\lib\site-packages\seaborn\matrix.py in
init(self, data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, cbar, cbar_kws, xticklabels, yticklabels, mask)
165 # Determine good default values for the colormapping
166 self._determine_cmap_params(plot_data, vmin, vmax,
--> 167 cmap, center, robust)
168
169 # Sort out the annotations
C:\Users\Demonstrator\Anaconda3\lib\site-packages\seaborn\matrix.py in
_determine_cmap_params(self, plot_data, vmin, vmax, cmap, center, robust)
202 cmap, center, robust):
203 """Use some heuristics to set good defaults for colorbar and range."""
--> 204 calc_data = plot_data.data[~np.isnan(plot_data.data)]
205 if vmin is None:
206 vmin = np.percentile(calc_data, 2) if robust else calc_data.min()
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types
according to the casting rule ''safe''
在将时间戳变量(即前两列)传递给sns.heatmap之前删除它,也不需要加载数据集,只需使用:
sns.heatmap(df.iloc[:,2:],annot=True, fmt="d", linewidths=.5)
编辑
好的,这是您的数据框,只是为了节省时间更改了列名
df
Out[9]:
v1 v2 v3 v4 v5 v6 v7 v8
0 2015-08-01 23:00:00 80 0 0 0 10 0
1 2015-08-01 23:20:00 60 0 20 0 10 10
2 2015-08-01 23:40:00 80 10 0 0 10 10
3 2015-08-01 00:00:00 60 10 20 40 10 10
现在 seaborn 无法正确识别热图的时间戳变量,因此我们将删除前两列并将数据帧传递给 seaborn
import seaborn as sns
sns.heatmap(df.iloc[:,2:],annot=True, fmt="d", linewidths=.5)
所以我们得到的结果是
如果使用此方法没有得到结果,请编辑您的问题以包含其余代码。那么这不是问题。
因为您没有将时间戳分配为索引。
行名是索引。这样做:
df1.set_index("TIMESTAMP", inplace=1)
此问题的另一个修复方法是:
ax = sns.heatmap(df1.iloc[:, 1:6:], annot=True, linewidths=.5)
ax.set_yticklabels([i.strftime("%Y-%m-%d %H:%M:%S") for i in df1.TIMESTAMP], rotation=0)
我是 Python 的新手,也是 seaborn 的新手。
我有一个名为 df 的 pandas 数据框,它看起来像:
TIMESTAMP ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F2 ACT_TIME_AERATEUR_1_F3 ACT_TIME_AERATEUR_1_F4 ACT_TIME_AERATEUR_1_F5 ACT_TIME_AERATEUR_1_F6
2015-08-01 23:00:00 80 0 0 0 10 0
2015-08-01 23:20:00 60 0 20 0 10 10
2015-08-01 23:40:00 80 10 0 0 10 10
2015-08-01 00:00:00 60 10 20 40 10 10
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38840 entries, 0 to 38839
Data columns (total 7 columns):
TIMESTAMP 38840 non-null datetime64[ns]
ACT_TIME_AERATEUR_1_F1 38696 non-null float64
ACT_TIME_AERATEUR_1_F3 38697 non-null float64
ACT_TIME_AERATEUR_1_F5 38695 non-null float64
ACT_TIME_AERATEUR_1_F6 38695 non-null float64
ACT_TIME_AERATEUR_1_F7 38693 non-null float64
ACT_TIME_AERATEUR_1_F8 38696 non-null float64
dtypes: datetime64[ns](1), float64(6)
memory usage: 2.1 MB
我尝试使用此代码制作热图:
data = sns.load_dataset("df")
# Draw a heatmap with the numeric values in each cell
sns.heatmap(data, annot=True, fmt="d", linewidths=.5)
但是不行 你能帮我找出错误吗?
谢谢
编辑 第一的 , 我从 csv 文件加载数据框:
df1 = pd.read_csv('C:/Users/Demonstrator/Downloads/Listeequipement.csv',delimiter=';', parse_dates=[0], infer_datetime_format = True)
然后,我 select 只有日期为 '2015-08-01 23:10:00' 和 '2015-08-02 00:00:00'
的行 import seaborn as sns
df1['TIMESTAMP']= pd.to_datetime(df1_no_missing['TIMESTAMP'], '%d-%m-%y %H:%M:%S')
df1['date'] = df_no_missing['TIMESTAMP'].dt.date
df1['time'] = df_no_missing['TIMESTAMP'].dt.time
date_debut = pd.to_datetime('2015-08-01 23:10:00')
date_fin = pd.to_datetime('2015-08-02 00:00:00')
df1 = df1[(df1['TIMESTAMP'] >= date_debut) & (df1['TIMESTAMP'] < date_fin)]
Then, construct the heatmap :
sns.heatmap(df1.iloc[:,2:],annot=True, fmt="d", linewidths=.5)
我收到这个错误:
TypeError Traceback (most recent call last) <ipython-input-363-a054889ebec3> in <module>() 7 df1 = df1[(df1['TIMESTAMP'] >= date_debut) & (df1['TIMESTAMP'] < date_fin)] 8 ----> 9 sns.heatmap(df1.iloc[:,2:],annot=True, fmt="d", linewidths=.5) C:\Users\Demonstrator\Anaconda3\lib\site-packages\seaborn\matrix.py in
heatmap(data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, linewidths, linecolor, cbar, cbar_kws, cbar_ax, square, ax, xticklabels, yticklabels, mask, **kwargs) 483 plotter = _HeatMapper(data, vmin, vmax, cmap, center, robust, annot, fmt, 484 annot_kws, cbar, cbar_kws, xticklabels, --> 485 yticklabels, mask) 486 487 # Add the pcolormesh kwargs here
C:\Users\Demonstrator\Anaconda3\lib\site-packages\seaborn\matrix.py in
init(self, data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, cbar, cbar_kws, xticklabels, yticklabels, mask) 165 # Determine good default values for the colormapping 166 self._determine_cmap_params(plot_data, vmin, vmax, --> 167 cmap, center, robust) 168 169 # Sort out the annotations
C:\Users\Demonstrator\Anaconda3\lib\site-packages\seaborn\matrix.py in
_determine_cmap_params(self, plot_data, vmin, vmax, cmap, center, robust) 202 cmap, center, robust): 203 """Use some heuristics to set good defaults for colorbar and range.""" --> 204 calc_data = plot_data.data[~np.isnan(plot_data.data)] 205 if vmin is None: 206 vmin = np.percentile(calc_data, 2) if robust else calc_data.min()
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types
according to the casting rule ''safe''
在将时间戳变量(即前两列)传递给sns.heatmap之前删除它,也不需要加载数据集,只需使用:
sns.heatmap(df.iloc[:,2:],annot=True, fmt="d", linewidths=.5)
编辑
好的,这是您的数据框,只是为了节省时间更改了列名
df
Out[9]:
v1 v2 v3 v4 v5 v6 v7 v8
0 2015-08-01 23:00:00 80 0 0 0 10 0
1 2015-08-01 23:20:00 60 0 20 0 10 10
2 2015-08-01 23:40:00 80 10 0 0 10 10
3 2015-08-01 00:00:00 60 10 20 40 10 10
现在 seaborn 无法正确识别热图的时间戳变量,因此我们将删除前两列并将数据帧传递给 seaborn
import seaborn as sns
sns.heatmap(df.iloc[:,2:],annot=True, fmt="d", linewidths=.5)
所以我们得到的结果是
如果使用此方法没有得到结果,请编辑您的问题以包含其余代码。那么这不是问题。
因为您没有将时间戳分配为索引。 行名是索引。这样做:
df1.set_index("TIMESTAMP", inplace=1)
此问题的另一个修复方法是:
ax = sns.heatmap(df1.iloc[:, 1:6:], annot=True, linewidths=.5)
ax.set_yticklabels([i.strftime("%Y-%m-%d %H:%M:%S") for i in df1.TIMESTAMP], rotation=0)