仅绘制时间序列的选定点

Question

我也有一个单变量时间序列结构：

data = [15, 5, 7, 9, 10, 23, 4, 6]

以及列表中值的得分列表，结构化也是如此：

score = [0.3, 0.6, 0.1, 0.8, 0.4, 0.7, 0.3, 0.1]

我也有门槛t = 0.5

据此，我创建了一个包含两列的数据框，其中第一列我有值，第二列我有 True 如果它是异常（意味着它有一个分数 > t) 和 False 如果不是 (score< t)。结构是这样的：

values | anomalies
  15   |   False
  5    |   True
  7    |   False
  9    |   True
  10   |   False
  23   |   True
  4    |   False
  6    |   False

我想做的是用一种颜色绘制 anomalies==True 的值，用另一种颜色绘制 anomalies==False 的值。我尝试绘制正常值，然后将它们与异常值重叠，如您在这段代码中所见：

fig = plt.figure(figsize=(25,5)) 
ax1=plt.subplot(121)
sns.lineplot(data=df['values'], ax=ax1) # plot normal time series plot
sns.lineplot(data=df['values'][(df['anomalies'] == True )], color='red', ax=ax1)

但是结果是下图中的，红点应该分开也连在一起了：

我该如何解决？

Answer 1

您可以使用 markevery 参数作为绘图函数的参数，如 [此处] (Highlighting arbitrary points in a matplotlib plot?) 所述。然后你可以根据自己的喜好设置markerface。

  import pandas as pd
  import numpy as np
  import matplotlib.pyplot as plt
  import seaborn as sns
  sns.set()
  data = [15, 5, 7, 9, 10, 23, 4, 6]
  score = [0.3, 0.6, 0.1, 0.8, 0.4, 0.7, 0.3, 0.1]
  df = pd.DataFrame(data,columns=['values'])
  df['score'] = score
  plt.figure(figsize=(8,6))
  plt.plot(df.index, df['values'], '-go', markevery=np.where(df.score > 0.5, True, False), markerfacecolor='b')
  plt.xlabel('Index')
  plt.ylabel('Values')
  plt.title('Anomalies Plot')

看起来像这样plot

您可以使用 seaborn 通过替换

获得类似的结果

plt.plot(df.index, df['values'], '-go', markevery=np.where(df.score > 0.5, True, False), markerfacecolor='b')

和

sns.scatterplot(x=df.index,y=df['values'], hue=df.score>0.5)
sns.lineplot(x=df.index,y=df['values'])

Answer 2

您可以先创建一个数据框：

df = pd.DataFrame(columns=['data','score','anomalies'])

然后：

df.loc[df[score]>t,'anomalies'] = 'True'

你的第一部分答案

Answer 3

使用LineCollection:

# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection

# Data
data = [15, 5, 7, 9, 10, 23, 4, 6]
score = [0.3, 0.6, 0.1, 0.8, 0.4, 0.7, 0.3, 0.1]
t = 0.5

# Create dataframe
df = pd.DataFrame({'values': data, 'score': score})
df['anomalies'] = df['score'] > t

# Build colored segments
x = zip(range(len(df)), range(1, len(df)))
y = zip(df['values'], df['values'][1:])
lines = [[(x0, x1), (y0, y1)] for (x0, y0), (x1, y1) in zip(x, y)]
linecolors = df['anomalies'].replace({True: 'red', False: 'blue'})
segments = LineCollection(lines, colors=linecolors)

# Plot chart
fig, ax = plt.subplots()
ax.add_collection(segments)

# Limits are not set automatically when using LineCollection
ax.set_xlim(0, len(df))
ax.set_ylim(0, df['values'].max()+1)

输出：

>>> df
   values  score  anomalies
0      15    0.3      False
1       5    0.6       True
2       7    0.1      False
3       9    0.8       True
4      10    0.4      False
5      23    0.7       True
6       4    0.3      False
7       6    0.1      False

仅绘制时间序列的选定点

Plot only selected points of a time series

python

plot

time-series

dataframe

pandas