如何绘制来自具有多个重复项的两个不同 DataFrame 的数据？

Question

我正在尝试创建一个图表来绘制苹果股票的历史数据以及地震发生情况。我有两个数据框，一个是历史苹果股票数据，另一个是历史地震数据。我想将每次地震发生显示为相对于该日期苹果股票价格的标记或形状。

问题

如何将地震事件绘制为相对于苹果图表的标记或形状？
如何处理多次地震的标记或形状并防止它们重叠或覆盖？

苹果数据

	Date	AAPL.Open	AAPL.High	AAPL.Low	AAPL.Close	AAPL.Volume	AAPL.Adjusted	dn	mavg	up	direction
0	2015-02-17 00:00:00+00:00	127.49	128.88	126.92	127.83	63152400	122.905	106.741	117.928	129.114	Increasing
1	2015-02-18 00:00:00+00:00	127.63	128.78	127.45	128.72	44891700	123.761	107.842	118.94	130.038	Increasing
2	2015-02-19 00:00:00+00:00	128.48	129.03	128.33	128.45	37362400	123.501	108.894	119.889	130.884	Decreasing
3	2015-02-20 00:00:00+00:00	128.62	129.5	128.05	129.5	48948400	124.511	109.785	120.764	131.742	Increasing
4	2015-02-23 00:00:00+00:00	130.02	133	129.66	133	70974100	127.876	110.373	121.72	133.068	Increasing

地震数据

	Date	Latitude	Longitude	Magnitude
22539	2015-02-17 00:00:00+00:00	40.1095	141.891	5.5
22540	2015-02-17 00:00:00+00:00	39.5696	143.583	5.5
22541	2015-02-18 00:00:00+00:00	8.3227	-103.159	5.5
22542	2015-02-18 00:00:00+00:00	8.285	-103.054	5.5
22543	2015-02-18 00:00:00+00:00	-10.7598	164.122	6.1

我现在的代码

import pandas as pd
import plotly.graph_objects as go

if __name__ == '__main__':
    # Create dataframe of historical apple stock and earth quakes
    df_apple_stock = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
    df_earthquakes = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')
    
    # Convert data column to UTC datetime
    df_apple_stock['Date'] = pd.to_datetime(df_apple_stock['Date'], utc=True)
    df_earthquakes['Date'] = pd.to_datetime(df_earthquakes['Date'], utc=True)
    
    # Trim earthquake data to be only of 2015-2016
    start_day = pd.to_datetime('02/17/2015', utc=True)
    end_day = pd.to_datetime('12/31/2016', utc=True)    
    
    df_earthquakes = df_earthquakes[df_earthquakes['Date'].between(start_day, end_day)]  
    
    fig = go.Figure(data=[go.Scatter(x=df_apple_stock['Date'],
                                     y=df_apple_stock['AAPL.Close'],
                                     customdata=df_apple_stock,
                                     mode='lines',  # lines+markers
                                     # marker=dict(
                                     #     size=5,
                                     #     line=dict(width=2, color='DarkSlateGrey')
                                     # ),
                                     # hoveron='points',
                                     hovertemplate=
                                     '<b>%{x}</b><br>' +
                                     'open: %{customdata[1]:$.2f} <br>' +
                                     'close: %{y:$.2f} <br>' +
                                     'high: %{customdata[2]:$.2f} <br>' +
                                     'low: %{customdata[3]:$.2f} <br>' +
                                     'volume: %{customdata[5]:,}'
                                     # '<extra>test</extra>'
                                     )])
    
    fig.show()

示例期望结果

我试过的

我尝试遍历每个地震行并添加注释；但是，这有问题：
- 我无法弄清楚如何相对于 Apple 股价定位地震注释
- 如果一天内发生多次地震，则只显示其中一次
- 迭代较大数据集中的每一行可能需要很长时间

for _, row in df_earthquakes.iterrows():
    fig.add_annotation(font=dict(color='red', size=15),
                       x=str(row.Date),
                       y=125,  # how do I reference 'y' from apple stock price?
                       showarrow=False,
                       text="Event",
                       align="left",
                       hovertext=("Date: " + str(row.Date) + "<br>" +
                                  "Magnitude: " + str(row.Magnitude) + "<br>" +
                                  "Latitude: " + str(row.Latitude) + "<br>" +
                                  "Longitude: " + str(row.Longitude)),
                     xanchor='left')

在散点图中绘制两条轨迹并使用 %{xother}

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_apple_stock['Date'],
    y=df_apple_stock['AAPL.Close'],
    fill='tozeroy',
    hovertemplate="%{y}%{_xother}"
))

fig.add_trace(go.Scatter(
    x=df_earthquakes['Date'],
    y=df_earthquakes['Magnitude'],
    fill='tonexty',
    hovertemplate="%{y}%{_xother}",
))

fig.update_layout(hovermode="x unified")

我尝试查找如何从多个数据周期添加数据并遇到了 Hover Templates with Mixtures of Period data，但我无法让它按我希望的那样工作
我试着阅读 documentation, markers, annotations, shared axis on subplots

Answer 1

您可以在辅助 y 轴上绘制 two y-axis
用plotly express画了地震图，然后把迹线和布局转移到其他所有图上

from plotly.subplots import make_subplots
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px


df_apple_stock = pd.read_csv(
    "https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv"
)
df_earthquakes = pd.read_csv(
    "https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv"
)

# Convert data column to UTC datetime
df_apple_stock["Date"] = pd.to_datetime(df_apple_stock["Date"], utc=True)
df_earthquakes["Date"] = pd.to_datetime(df_earthquakes["Date"], utc=True)

# Trim earthquake data to be only of 2015-2016
start_day = pd.to_datetime("02/17/2015", utc=True)
end_day = pd.to_datetime("12/31/2016", utc=True)

df_earthquakes = df_earthquakes[df_earthquakes["Date"].between(start_day, end_day)]

fig = go.Figure(
    data=[
        go.Scatter(
            x=df_apple_stock["Date"],
            y=df_apple_stock["AAPL.Close"],
            customdata=df_apple_stock,
            mode="lines",  # lines+markers
            name="AAPL.Close",
            # marker=dict(
            #     size=5,
            #     line=dict(width=2, color='DarkSlateGrey')
            # ),
            # hoveron='points',
            hovertemplate="<b>%{x}</b><br>"
            + "open: %{customdata[1]:$.2f} <br>"
            + "close: %{y:$.2f} <br>"
            + "high: %{customdata[2]:$.2f} <br>"
            + "low: %{customdata[3]:$.2f} <br>"
            + "volume: %{customdata[5]:,}"
            # '<extra>test</extra>'
        )
    ]
)

fige = px.scatter(
    df_earthquakes,
    x="Date",
    y="Magnitude",
    color="Magnitude",
    color_continuous_scale="reds",
)

fig2 = make_subplots(specs=[[{"secondary_y": True}]])

fig2.add_trace(fig.data[0])
fig2.add_trace(fige.data[0], secondary_y=True)

fig2.update_layout(coloraxis=fige.layout.coloraxis).update_layout(coloraxis={"colorbar":{"y":.4}})

地震的替代品

受@vestland 回答的启发
地震数据可以先用pandas汇总，频率不是每天所以汇总到每天
还过滤/删除了地震次数少于 3 次的日子
关于颜色和尺码的更多信息

fige = px.scatter(
    df_earthquakes.groupby(df_earthquakes["Date"].dt.date).agg(
    Magnitude=("Magnitude", "max"), Count=("Date", "count")
).reset_index().loc[lambda d: d["Count"].gt(3)],
    x="Date",
    y="Magnitude",
    color="Magnitude",
    size="Count",
    color_continuous_scale="rdylgn_r",
)

Answer 2

我已经提出了一个建议，可以解决您的顾虑。我正在使用内置数据集和一些重复日期的随机选择。如果您希望我处理您实际数据集的样本，请使用 .

中描述的方法将其包括在内

第一个建议：

1. 主轨迹添加到图 fig.add_traces(go.Scatter)

2.有地震的日期被安排在两个不同的数据集中；一个显示单个事件的日期，一个显示重复日期。

3. 重复的日期组织在 multiple = quakes[quakes.date.duplicated()] 中，每条记录都分配给一个跟踪。这将使您可以根据需要设置不同的符号和悬停数据。

4. 属于重复日期的值在 y 轴上相互比较，以确保相应的注释不会重叠或覆盖。

如果这接近您想要的结果，我们可以在您找到时间时详细讨论。

剧情：

代码 1

# imports
import pandas as pd
import plotly.express as px
import random
import numpy as np
import plotly.graph_objects as go
from plotly.validators.scatter.marker import SymbolValidator
from itertools import cycle

np.random.seed(123)

# data
df = px.data.stocks()
df = df.drop(['GOOG', 'AMZN', 'NFLX', 'FB'], axis = 1).tail(150)

# simule
quakes =pd.DataFrame()

dates = [random.choice(df.date.values) for obs in range(0, 6)]
dates.extend([df.date.iloc[2], df.date.iloc[2], df.date.iloc[6], df.date.iloc[6], df.date.iloc[6]])

# synthetic data for earthquakes
quakes['date'] = dates
quakes['magnitude'] = [np.random.uniform(5,7) for obs in quakes.date]
quakes = pd.concat([quakes, quakes.groupby('date').cumcount().to_frame('dupes')], axis = 1)

# find dates with multiple quakes
multiple = quakes[quakes.date.duplicated()].sort_values('date').reset_index()#.sorted()

# find dates where only one quake occurs (to keep number of traces at a minimum)
single = quakes[~quakes.date.duplicated()].sort_values('date').reset_index()
single = pd.merge(df, single, on = 'date', how = 'right')

fig = go.Figure(go.Scatter(x = df['date'], y = df['AAPL'], name = 'Apple'))
fig.add_traces(go.Scatter(x=single['date'], y =single['AAPL'],
                          mode = 'markers',
                          name = 'days with quakes',
                          showlegend = True,
                          marker = dict(symbol = 'square', size = single['magnitude']**2)))

symbols = cycle(['circle', 'hexagon', 'diamond', 'star'])
annotations = []
for i, r in multiple.iterrows():
    fig.add_traces(go.Scatter(x=[r['date']], y = df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10),
                              mode = 'markers',
                              name = r['date'],
                              marker = dict(symbol = next(symbols), size = r['magnitude']**2)))
    annotations.append([r['date'], df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10), r['magnitude']])

# annotate single events
for i, q in enumerate(fig.data[1].x):
        fig.add_annotation(x=q, y = fig.data[1].y[i],
                       text = str(fig.data[1].y[i])[:3], showarrow = False,
                       font = dict(size = 10),
                       yref = 'y',
                       ay=0)
    

# annotate duplicates
for a in annotations:
    fig.add_annotation(x=a[0], y = a[1].item(),
                       text = str(a[2])[:4], showarrow = False,
                       font = dict(size = 10),
                       yref = 'y',
                       ay=0)
fig.show()

如何绘制来自具有多个重复项的两个不同 DataFrame 的数据？

How to plot data from two different DataFrames with multiple duplicates?

python

pandas

plotly

plotly-python

问题