如何绘制来自具有多个重复项的两个不同 DataFrame 的数据?

How to plot data from two different DataFrames with multiple duplicates?

我正在尝试创建一个图表来绘制苹果股票的历史数据以及地震发生情况。我有两个数据框,一个是历史苹果股票数据,另一个是历史地震数据。我想将每次地震发生显示为相对于该日期苹果股票价格的标记或形状。

问题

  1. 如何将地震事件绘制为相对于苹果图表的标记或形状?
  2. 如何处理多次地震的标记或形状并防止它们重叠或覆盖?

苹果数据

Date AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted dn mavg up direction
0 2015-02-17 00:00:00+00:00 127.49 128.88 126.92 127.83 63152400 122.905 106.741 117.928 129.114 Increasing
1 2015-02-18 00:00:00+00:00 127.63 128.78 127.45 128.72 44891700 123.761 107.842 118.94 130.038 Increasing
2 2015-02-19 00:00:00+00:00 128.48 129.03 128.33 128.45 37362400 123.501 108.894 119.889 130.884 Decreasing
3 2015-02-20 00:00:00+00:00 128.62 129.5 128.05 129.5 48948400 124.511 109.785 120.764 131.742 Increasing
4 2015-02-23 00:00:00+00:00 130.02 133 129.66 133 70974100 127.876 110.373 121.72 133.068 Increasing

地震数据

Date Latitude Longitude Magnitude
22539 2015-02-17 00:00:00+00:00 40.1095 141.891 5.5
22540 2015-02-17 00:00:00+00:00 39.5696 143.583 5.5
22541 2015-02-18 00:00:00+00:00 8.3227 -103.159 5.5
22542 2015-02-18 00:00:00+00:00 8.285 -103.054 5.5
22543 2015-02-18 00:00:00+00:00 -10.7598 164.122 6.1

我现在的代码

import pandas as pd
import plotly.graph_objects as go

if __name__ == '__main__':
    # Create dataframe of historical apple stock and earth quakes
    df_apple_stock = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
    df_earthquakes = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')
    
    # Convert data column to UTC datetime
    df_apple_stock['Date'] = pd.to_datetime(df_apple_stock['Date'], utc=True)
    df_earthquakes['Date'] = pd.to_datetime(df_earthquakes['Date'], utc=True)
    
    # Trim earthquake data to be only of 2015-2016
    start_day = pd.to_datetime('02/17/2015', utc=True)
    end_day = pd.to_datetime('12/31/2016', utc=True)    
    
    df_earthquakes = df_earthquakes[df_earthquakes['Date'].between(start_day, end_day)]  
    
    fig = go.Figure(data=[go.Scatter(x=df_apple_stock['Date'],
                                     y=df_apple_stock['AAPL.Close'],
                                     customdata=df_apple_stock,
                                     mode='lines',  # lines+markers
                                     # marker=dict(
                                     #     size=5,
                                     #     line=dict(width=2, color='DarkSlateGrey')
                                     # ),
                                     # hoveron='points',
                                     hovertemplate=
                                     '<b>%{x}</b><br>' +
                                     'open: %{customdata[1]:$.2f} <br>' +
                                     'close: %{y:$.2f} <br>' +
                                     'high: %{customdata[2]:$.2f} <br>' +
                                     'low: %{customdata[3]:$.2f} <br>' +
                                     'volume: %{customdata[5]:,}'
                                     # '<extra>test</extra>'
                                     )])
    
    fig.show()

示例期望结果

我试过的

  1. 我尝试遍历每个地震行并添加注释;但是,这有问题:
    • 我无法弄清楚如何相对于 Apple 股价定位地震注释
    • 如果一天内发生多次地震,则只显示其中一次
    • 迭代较大数据集中的每一行可能需要很长时间
for _, row in df_earthquakes.iterrows():
    fig.add_annotation(font=dict(color='red', size=15),
                       x=str(row.Date),
                       y=125,  # how do I reference 'y' from apple stock price?
                       showarrow=False,
                       text="Event",
                       align="left",
                       hovertext=("Date: " + str(row.Date) + "<br>" +
                                  "Magnitude: " + str(row.Magnitude) + "<br>" +
                                  "Latitude: " + str(row.Latitude) + "<br>" +
                                  "Longitude: " + str(row.Longitude)),
                     xanchor='left')
  1. 在散点图中绘制两条轨迹并使用 %{xother}
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_apple_stock['Date'],
    y=df_apple_stock['AAPL.Close'],
    fill='tozeroy',
    hovertemplate="%{y}%{_xother}"
))

fig.add_trace(go.Scatter(
    x=df_earthquakes['Date'],
    y=df_earthquakes['Magnitude'],
    fill='tonexty',
    hovertemplate="%{y}%{_xother}",
))

fig.update_layout(hovermode="x unified")
  1. 我尝试查找如何从多个数据周期添加数据并遇到了 Hover Templates with Mixtures of Period data,但我无法让它按我希望的那样工作
  2. 我试着阅读 documentation, markers, annotations, shared axis on subplots
  • 您可以在辅助 y 轴上绘制 two y-axis
  • 用plotly express画了地震图,然后把迹线和布局转移到其他所有图上
from plotly.subplots import make_subplots
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px


df_apple_stock = pd.read_csv(
    "https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv"
)
df_earthquakes = pd.read_csv(
    "https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv"
)

# Convert data column to UTC datetime
df_apple_stock["Date"] = pd.to_datetime(df_apple_stock["Date"], utc=True)
df_earthquakes["Date"] = pd.to_datetime(df_earthquakes["Date"], utc=True)

# Trim earthquake data to be only of 2015-2016
start_day = pd.to_datetime("02/17/2015", utc=True)
end_day = pd.to_datetime("12/31/2016", utc=True)

df_earthquakes = df_earthquakes[df_earthquakes["Date"].between(start_day, end_day)]

fig = go.Figure(
    data=[
        go.Scatter(
            x=df_apple_stock["Date"],
            y=df_apple_stock["AAPL.Close"],
            customdata=df_apple_stock,
            mode="lines",  # lines+markers
            name="AAPL.Close",
            # marker=dict(
            #     size=5,
            #     line=dict(width=2, color='DarkSlateGrey')
            # ),
            # hoveron='points',
            hovertemplate="<b>%{x}</b><br>"
            + "open: %{customdata[1]:$.2f} <br>"
            + "close: %{y:$.2f} <br>"
            + "high: %{customdata[2]:$.2f} <br>"
            + "low: %{customdata[3]:$.2f} <br>"
            + "volume: %{customdata[5]:,}"
            # '<extra>test</extra>'
        )
    ]
)

fige = px.scatter(
    df_earthquakes,
    x="Date",
    y="Magnitude",
    color="Magnitude",
    color_continuous_scale="reds",
)

fig2 = make_subplots(specs=[[{"secondary_y": True}]])

fig2.add_trace(fig.data[0])
fig2.add_trace(fige.data[0], secondary_y=True)

fig2.update_layout(coloraxis=fige.layout.coloraxis).update_layout(coloraxis={"colorbar":{"y":.4}})

地震的替代品

  • 受@vestland 回答的启发
  • 地震数据可以先用pandas汇总,频率不是每天所以汇总到每天
  • 还过滤/删除了地震次数少于 3 次的日子
  • 关于颜色和尺码的更多信息
fige = px.scatter(
    df_earthquakes.groupby(df_earthquakes["Date"].dt.date).agg(
    Magnitude=("Magnitude", "max"), Count=("Date", "count")
).reset_index().loc[lambda d: d["Count"].gt(3)],
    x="Date",
    y="Magnitude",
    color="Magnitude",
    size="Count",
    color_continuous_scale="rdylgn_r",
)

我已经提出了一个建议,可以解决您的顾虑。我正在使用内置数据集和一些重复日期的随机选择。如果您希望我处理您实际数据集的样本,请使用 .

中描述的方法将其包括在内

第一个建议:

1. 主轨迹添加到图 fig.add_traces(go.Scatter)

2.有地震的日期被安排在两个不同的数据集中;一个显示单个事件的日期,一个显示重复日期。

3. 重复的日期组织在 multiple = quakes[quakes.date.duplicated()] 中,每条记录都分配给一个跟踪。这将使您可以根据需要设置不同的符号和悬停数据。

4. 属于重复日期的值在 y 轴上相互比较,以确保相应的注释不会重叠或覆盖。

如果这接近您想要的结果,我们可以在您找到时间时详细讨论。

剧情:

代码 1

# imports
import pandas as pd
import plotly.express as px
import random
import numpy as np
import plotly.graph_objects as go
from plotly.validators.scatter.marker import SymbolValidator
from itertools import cycle

np.random.seed(123)

# data
df = px.data.stocks()
df = df.drop(['GOOG', 'AMZN', 'NFLX', 'FB'], axis = 1).tail(150)

# simule
quakes =pd.DataFrame()

dates = [random.choice(df.date.values) for obs in range(0, 6)]
dates.extend([df.date.iloc[2], df.date.iloc[2], df.date.iloc[6], df.date.iloc[6], df.date.iloc[6]])

# synthetic data for earthquakes
quakes['date'] = dates
quakes['magnitude'] = [np.random.uniform(5,7) for obs in quakes.date]
quakes = pd.concat([quakes, quakes.groupby('date').cumcount().to_frame('dupes')], axis = 1)

# find dates with multiple quakes
multiple = quakes[quakes.date.duplicated()].sort_values('date').reset_index()#.sorted()

# find dates where only one quake occurs (to keep number of traces at a minimum)
single = quakes[~quakes.date.duplicated()].sort_values('date').reset_index()
single = pd.merge(df, single, on = 'date', how = 'right')

fig = go.Figure(go.Scatter(x = df['date'], y = df['AAPL'], name = 'Apple'))
fig.add_traces(go.Scatter(x=single['date'], y =single['AAPL'],
                          mode = 'markers',
                          name = 'days with quakes',
                          showlegend = True,
                          marker = dict(symbol = 'square', size = single['magnitude']**2)))

symbols = cycle(['circle', 'hexagon', 'diamond', 'star'])
annotations = []
for i, r in multiple.iterrows():
    fig.add_traces(go.Scatter(x=[r['date']], y = df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10),
                              mode = 'markers',
                              name = r['date'],
                              marker = dict(symbol = next(symbols), size = r['magnitude']**2)))
    annotations.append([r['date'], df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10), r['magnitude']])

# annotate single events
for i, q in enumerate(fig.data[1].x):
        fig.add_annotation(x=q, y = fig.data[1].y[i],
                       text = str(fig.data[1].y[i])[:3], showarrow = False,
                       font = dict(size = 10),
                       yref = 'y',
                       ay=0)
    

# annotate duplicates
for a in annotations:
    fig.add_annotation(x=a[0], y = a[1].item(),
                       text = str(a[2])[:4], showarrow = False,
                       font = dict(size = 10),
                       yref = 'y',
                       ay=0)
fig.show()