如何绘制来自具有多个重复项的两个不同 DataFrame 的数据?
How to plot data from two different DataFrames with multiple duplicates?
我正在尝试创建一个图表来绘制苹果股票的历史数据以及地震发生情况。我有两个数据框,一个是历史苹果股票数据,另一个是历史地震数据。我想将每次地震发生显示为相对于该日期苹果股票价格的标记或形状。
问题
- 如何将地震事件绘制为相对于苹果图表的标记或形状?
- 如何处理多次地震的标记或形状并防止它们重叠或覆盖?
苹果数据
Date
AAPL.Open
AAPL.High
AAPL.Low
AAPL.Close
AAPL.Volume
AAPL.Adjusted
dn
mavg
up
direction
0
2015-02-17 00:00:00+00:00
127.49
128.88
126.92
127.83
63152400
122.905
106.741
117.928
129.114
Increasing
1
2015-02-18 00:00:00+00:00
127.63
128.78
127.45
128.72
44891700
123.761
107.842
118.94
130.038
Increasing
2
2015-02-19 00:00:00+00:00
128.48
129.03
128.33
128.45
37362400
123.501
108.894
119.889
130.884
Decreasing
3
2015-02-20 00:00:00+00:00
128.62
129.5
128.05
129.5
48948400
124.511
109.785
120.764
131.742
Increasing
4
2015-02-23 00:00:00+00:00
130.02
133
129.66
133
70974100
127.876
110.373
121.72
133.068
Increasing
地震数据
Date
Latitude
Longitude
Magnitude
22539
2015-02-17 00:00:00+00:00
40.1095
141.891
5.5
22540
2015-02-17 00:00:00+00:00
39.5696
143.583
5.5
22541
2015-02-18 00:00:00+00:00
8.3227
-103.159
5.5
22542
2015-02-18 00:00:00+00:00
8.285
-103.054
5.5
22543
2015-02-18 00:00:00+00:00
-10.7598
164.122
6.1
我现在的代码
import pandas as pd
import plotly.graph_objects as go
if __name__ == '__main__':
# Create dataframe of historical apple stock and earth quakes
df_apple_stock = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
df_earthquakes = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')
# Convert data column to UTC datetime
df_apple_stock['Date'] = pd.to_datetime(df_apple_stock['Date'], utc=True)
df_earthquakes['Date'] = pd.to_datetime(df_earthquakes['Date'], utc=True)
# Trim earthquake data to be only of 2015-2016
start_day = pd.to_datetime('02/17/2015', utc=True)
end_day = pd.to_datetime('12/31/2016', utc=True)
df_earthquakes = df_earthquakes[df_earthquakes['Date'].between(start_day, end_day)]
fig = go.Figure(data=[go.Scatter(x=df_apple_stock['Date'],
y=df_apple_stock['AAPL.Close'],
customdata=df_apple_stock,
mode='lines', # lines+markers
# marker=dict(
# size=5,
# line=dict(width=2, color='DarkSlateGrey')
# ),
# hoveron='points',
hovertemplate=
'<b>%{x}</b><br>' +
'open: %{customdata[1]:$.2f} <br>' +
'close: %{y:$.2f} <br>' +
'high: %{customdata[2]:$.2f} <br>' +
'low: %{customdata[3]:$.2f} <br>' +
'volume: %{customdata[5]:,}'
# '<extra>test</extra>'
)])
fig.show()
示例期望结果
我试过的
- 我尝试遍历每个地震行并添加注释;但是,这有问题:
- 我无法弄清楚如何相对于 Apple 股价定位地震注释
- 如果一天内发生多次地震,则只显示其中一次
- 迭代较大数据集中的每一行可能需要很长时间
for _, row in df_earthquakes.iterrows():
fig.add_annotation(font=dict(color='red', size=15),
x=str(row.Date),
y=125, # how do I reference 'y' from apple stock price?
showarrow=False,
text="Event",
align="left",
hovertext=("Date: " + str(row.Date) + "<br>" +
"Magnitude: " + str(row.Magnitude) + "<br>" +
"Latitude: " + str(row.Latitude) + "<br>" +
"Longitude: " + str(row.Longitude)),
xanchor='left')
- 在散点图中绘制两条轨迹并使用 %{xother}
fig = go.Figure()
fig.add_trace(go.Scatter(
x=df_apple_stock['Date'],
y=df_apple_stock['AAPL.Close'],
fill='tozeroy',
hovertemplate="%{y}%{_xother}"
))
fig.add_trace(go.Scatter(
x=df_earthquakes['Date'],
y=df_earthquakes['Magnitude'],
fill='tonexty',
hovertemplate="%{y}%{_xother}",
))
fig.update_layout(hovermode="x unified")
- 我尝试查找如何从多个数据周期添加数据并遇到了 Hover Templates with Mixtures of Period data,但我无法让它按我希望的那样工作
- 我试着阅读 documentation, markers, annotations, shared axis on subplots
- 您可以在辅助 y 轴上绘制 two y-axis
- 用plotly express画了地震图,然后把迹线和布局转移到其他所有图上
from plotly.subplots import make_subplots
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
df_apple_stock = pd.read_csv(
"https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv"
)
df_earthquakes = pd.read_csv(
"https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv"
)
# Convert data column to UTC datetime
df_apple_stock["Date"] = pd.to_datetime(df_apple_stock["Date"], utc=True)
df_earthquakes["Date"] = pd.to_datetime(df_earthquakes["Date"], utc=True)
# Trim earthquake data to be only of 2015-2016
start_day = pd.to_datetime("02/17/2015", utc=True)
end_day = pd.to_datetime("12/31/2016", utc=True)
df_earthquakes = df_earthquakes[df_earthquakes["Date"].between(start_day, end_day)]
fig = go.Figure(
data=[
go.Scatter(
x=df_apple_stock["Date"],
y=df_apple_stock["AAPL.Close"],
customdata=df_apple_stock,
mode="lines", # lines+markers
name="AAPL.Close",
# marker=dict(
# size=5,
# line=dict(width=2, color='DarkSlateGrey')
# ),
# hoveron='points',
hovertemplate="<b>%{x}</b><br>"
+ "open: %{customdata[1]:$.2f} <br>"
+ "close: %{y:$.2f} <br>"
+ "high: %{customdata[2]:$.2f} <br>"
+ "low: %{customdata[3]:$.2f} <br>"
+ "volume: %{customdata[5]:,}"
# '<extra>test</extra>'
)
]
)
fige = px.scatter(
df_earthquakes,
x="Date",
y="Magnitude",
color="Magnitude",
color_continuous_scale="reds",
)
fig2 = make_subplots(specs=[[{"secondary_y": True}]])
fig2.add_trace(fig.data[0])
fig2.add_trace(fige.data[0], secondary_y=True)
fig2.update_layout(coloraxis=fige.layout.coloraxis).update_layout(coloraxis={"colorbar":{"y":.4}})
地震的替代品
- 受@vestland 回答的启发
- 地震数据可以先用pandas汇总,频率不是每天所以汇总到每天
- 还过滤/删除了地震次数少于 3 次的日子
- 关于颜色和尺码的更多信息
fige = px.scatter(
df_earthquakes.groupby(df_earthquakes["Date"].dt.date).agg(
Magnitude=("Magnitude", "max"), Count=("Date", "count")
).reset_index().loc[lambda d: d["Count"].gt(3)],
x="Date",
y="Magnitude",
color="Magnitude",
size="Count",
color_continuous_scale="rdylgn_r",
)
我已经提出了一个建议,可以解决您的顾虑。我正在使用内置数据集和一些重复日期的随机选择。如果您希望我处理您实际数据集的样本,请使用 .
中描述的方法将其包括在内
第一个建议:
1. 主轨迹添加到图 fig.add_traces(go.Scatter)
2.有地震的日期被安排在两个不同的数据集中;一个显示单个事件的日期,一个显示重复日期。
3. 重复的日期组织在 multiple = quakes[quakes.date.duplicated()]
中,每条记录都分配给一个跟踪。这将使您可以根据需要设置不同的符号和悬停数据。
4. 属于重复日期的值在 y 轴上相互比较,以确保相应的注释不会重叠或覆盖。
如果这接近您想要的结果,我们可以在您找到时间时详细讨论。
剧情:
代码 1
# imports
import pandas as pd
import plotly.express as px
import random
import numpy as np
import plotly.graph_objects as go
from plotly.validators.scatter.marker import SymbolValidator
from itertools import cycle
np.random.seed(123)
# data
df = px.data.stocks()
df = df.drop(['GOOG', 'AMZN', 'NFLX', 'FB'], axis = 1).tail(150)
# simule
quakes =pd.DataFrame()
dates = [random.choice(df.date.values) for obs in range(0, 6)]
dates.extend([df.date.iloc[2], df.date.iloc[2], df.date.iloc[6], df.date.iloc[6], df.date.iloc[6]])
# synthetic data for earthquakes
quakes['date'] = dates
quakes['magnitude'] = [np.random.uniform(5,7) for obs in quakes.date]
quakes = pd.concat([quakes, quakes.groupby('date').cumcount().to_frame('dupes')], axis = 1)
# find dates with multiple quakes
multiple = quakes[quakes.date.duplicated()].sort_values('date').reset_index()#.sorted()
# find dates where only one quake occurs (to keep number of traces at a minimum)
single = quakes[~quakes.date.duplicated()].sort_values('date').reset_index()
single = pd.merge(df, single, on = 'date', how = 'right')
fig = go.Figure(go.Scatter(x = df['date'], y = df['AAPL'], name = 'Apple'))
fig.add_traces(go.Scatter(x=single['date'], y =single['AAPL'],
mode = 'markers',
name = 'days with quakes',
showlegend = True,
marker = dict(symbol = 'square', size = single['magnitude']**2)))
symbols = cycle(['circle', 'hexagon', 'diamond', 'star'])
annotations = []
for i, r in multiple.iterrows():
fig.add_traces(go.Scatter(x=[r['date']], y = df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10),
mode = 'markers',
name = r['date'],
marker = dict(symbol = next(symbols), size = r['magnitude']**2)))
annotations.append([r['date'], df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10), r['magnitude']])
# annotate single events
for i, q in enumerate(fig.data[1].x):
fig.add_annotation(x=q, y = fig.data[1].y[i],
text = str(fig.data[1].y[i])[:3], showarrow = False,
font = dict(size = 10),
yref = 'y',
ay=0)
# annotate duplicates
for a in annotations:
fig.add_annotation(x=a[0], y = a[1].item(),
text = str(a[2])[:4], showarrow = False,
font = dict(size = 10),
yref = 'y',
ay=0)
fig.show()
我正在尝试创建一个图表来绘制苹果股票的历史数据以及地震发生情况。我有两个数据框,一个是历史苹果股票数据,另一个是历史地震数据。我想将每次地震发生显示为相对于该日期苹果股票价格的标记或形状。
问题
- 如何将地震事件绘制为相对于苹果图表的标记或形状?
- 如何处理多次地震的标记或形状并防止它们重叠或覆盖?
苹果数据
Date | AAPL.Open | AAPL.High | AAPL.Low | AAPL.Close | AAPL.Volume | AAPL.Adjusted | dn | mavg | up | direction | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2015-02-17 00:00:00+00:00 | 127.49 | 128.88 | 126.92 | 127.83 | 63152400 | 122.905 | 106.741 | 117.928 | 129.114 | Increasing |
1 | 2015-02-18 00:00:00+00:00 | 127.63 | 128.78 | 127.45 | 128.72 | 44891700 | 123.761 | 107.842 | 118.94 | 130.038 | Increasing |
2 | 2015-02-19 00:00:00+00:00 | 128.48 | 129.03 | 128.33 | 128.45 | 37362400 | 123.501 | 108.894 | 119.889 | 130.884 | Decreasing |
3 | 2015-02-20 00:00:00+00:00 | 128.62 | 129.5 | 128.05 | 129.5 | 48948400 | 124.511 | 109.785 | 120.764 | 131.742 | Increasing |
4 | 2015-02-23 00:00:00+00:00 | 130.02 | 133 | 129.66 | 133 | 70974100 | 127.876 | 110.373 | 121.72 | 133.068 | Increasing |
地震数据
Date | Latitude | Longitude | Magnitude | |
---|---|---|---|---|
22539 | 2015-02-17 00:00:00+00:00 | 40.1095 | 141.891 | 5.5 |
22540 | 2015-02-17 00:00:00+00:00 | 39.5696 | 143.583 | 5.5 |
22541 | 2015-02-18 00:00:00+00:00 | 8.3227 | -103.159 | 5.5 |
22542 | 2015-02-18 00:00:00+00:00 | 8.285 | -103.054 | 5.5 |
22543 | 2015-02-18 00:00:00+00:00 | -10.7598 | 164.122 | 6.1 |
我现在的代码
import pandas as pd
import plotly.graph_objects as go
if __name__ == '__main__':
# Create dataframe of historical apple stock and earth quakes
df_apple_stock = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
df_earthquakes = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv')
# Convert data column to UTC datetime
df_apple_stock['Date'] = pd.to_datetime(df_apple_stock['Date'], utc=True)
df_earthquakes['Date'] = pd.to_datetime(df_earthquakes['Date'], utc=True)
# Trim earthquake data to be only of 2015-2016
start_day = pd.to_datetime('02/17/2015', utc=True)
end_day = pd.to_datetime('12/31/2016', utc=True)
df_earthquakes = df_earthquakes[df_earthquakes['Date'].between(start_day, end_day)]
fig = go.Figure(data=[go.Scatter(x=df_apple_stock['Date'],
y=df_apple_stock['AAPL.Close'],
customdata=df_apple_stock,
mode='lines', # lines+markers
# marker=dict(
# size=5,
# line=dict(width=2, color='DarkSlateGrey')
# ),
# hoveron='points',
hovertemplate=
'<b>%{x}</b><br>' +
'open: %{customdata[1]:$.2f} <br>' +
'close: %{y:$.2f} <br>' +
'high: %{customdata[2]:$.2f} <br>' +
'low: %{customdata[3]:$.2f} <br>' +
'volume: %{customdata[5]:,}'
# '<extra>test</extra>'
)])
fig.show()
示例期望结果
我试过的
- 我尝试遍历每个地震行并添加注释;但是,这有问题:
- 我无法弄清楚如何相对于 Apple 股价定位地震注释
- 如果一天内发生多次地震,则只显示其中一次
- 迭代较大数据集中的每一行可能需要很长时间
for _, row in df_earthquakes.iterrows():
fig.add_annotation(font=dict(color='red', size=15),
x=str(row.Date),
y=125, # how do I reference 'y' from apple stock price?
showarrow=False,
text="Event",
align="left",
hovertext=("Date: " + str(row.Date) + "<br>" +
"Magnitude: " + str(row.Magnitude) + "<br>" +
"Latitude: " + str(row.Latitude) + "<br>" +
"Longitude: " + str(row.Longitude)),
xanchor='left')
- 在散点图中绘制两条轨迹并使用 %{xother}
fig = go.Figure()
fig.add_trace(go.Scatter(
x=df_apple_stock['Date'],
y=df_apple_stock['AAPL.Close'],
fill='tozeroy',
hovertemplate="%{y}%{_xother}"
))
fig.add_trace(go.Scatter(
x=df_earthquakes['Date'],
y=df_earthquakes['Magnitude'],
fill='tonexty',
hovertemplate="%{y}%{_xother}",
))
fig.update_layout(hovermode="x unified")
- 我尝试查找如何从多个数据周期添加数据并遇到了 Hover Templates with Mixtures of Period data,但我无法让它按我希望的那样工作
- 我试着阅读 documentation, markers, annotations, shared axis on subplots
- 您可以在辅助 y 轴上绘制 two y-axis
- 用plotly express画了地震图,然后把迹线和布局转移到其他所有图上
from plotly.subplots import make_subplots
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
df_apple_stock = pd.read_csv(
"https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv"
)
df_earthquakes = pd.read_csv(
"https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv"
)
# Convert data column to UTC datetime
df_apple_stock["Date"] = pd.to_datetime(df_apple_stock["Date"], utc=True)
df_earthquakes["Date"] = pd.to_datetime(df_earthquakes["Date"], utc=True)
# Trim earthquake data to be only of 2015-2016
start_day = pd.to_datetime("02/17/2015", utc=True)
end_day = pd.to_datetime("12/31/2016", utc=True)
df_earthquakes = df_earthquakes[df_earthquakes["Date"].between(start_day, end_day)]
fig = go.Figure(
data=[
go.Scatter(
x=df_apple_stock["Date"],
y=df_apple_stock["AAPL.Close"],
customdata=df_apple_stock,
mode="lines", # lines+markers
name="AAPL.Close",
# marker=dict(
# size=5,
# line=dict(width=2, color='DarkSlateGrey')
# ),
# hoveron='points',
hovertemplate="<b>%{x}</b><br>"
+ "open: %{customdata[1]:$.2f} <br>"
+ "close: %{y:$.2f} <br>"
+ "high: %{customdata[2]:$.2f} <br>"
+ "low: %{customdata[3]:$.2f} <br>"
+ "volume: %{customdata[5]:,}"
# '<extra>test</extra>'
)
]
)
fige = px.scatter(
df_earthquakes,
x="Date",
y="Magnitude",
color="Magnitude",
color_continuous_scale="reds",
)
fig2 = make_subplots(specs=[[{"secondary_y": True}]])
fig2.add_trace(fig.data[0])
fig2.add_trace(fige.data[0], secondary_y=True)
fig2.update_layout(coloraxis=fige.layout.coloraxis).update_layout(coloraxis={"colorbar":{"y":.4}})
地震的替代品
- 受@vestland 回答的启发
- 地震数据可以先用pandas汇总,频率不是每天所以汇总到每天
- 还过滤/删除了地震次数少于 3 次的日子
- 关于颜色和尺码的更多信息
fige = px.scatter(
df_earthquakes.groupby(df_earthquakes["Date"].dt.date).agg(
Magnitude=("Magnitude", "max"), Count=("Date", "count")
).reset_index().loc[lambda d: d["Count"].gt(3)],
x="Date",
y="Magnitude",
color="Magnitude",
size="Count",
color_continuous_scale="rdylgn_r",
)
我已经提出了一个建议,可以解决您的顾虑。我正在使用内置数据集和一些重复日期的随机选择。如果您希望我处理您实际数据集的样本,请使用
第一个建议:
1. 主轨迹添加到图 fig.add_traces(go.Scatter)
2.有地震的日期被安排在两个不同的数据集中;一个显示单个事件的日期,一个显示重复日期。
3. 重复的日期组织在 multiple = quakes[quakes.date.duplicated()]
中,每条记录都分配给一个跟踪。这将使您可以根据需要设置不同的符号和悬停数据。
4. 属于重复日期的值在 y 轴上相互比较,以确保相应的注释不会重叠或覆盖。
如果这接近您想要的结果,我们可以在您找到时间时详细讨论。
剧情:
代码 1
# imports
import pandas as pd
import plotly.express as px
import random
import numpy as np
import plotly.graph_objects as go
from plotly.validators.scatter.marker import SymbolValidator
from itertools import cycle
np.random.seed(123)
# data
df = px.data.stocks()
df = df.drop(['GOOG', 'AMZN', 'NFLX', 'FB'], axis = 1).tail(150)
# simule
quakes =pd.DataFrame()
dates = [random.choice(df.date.values) for obs in range(0, 6)]
dates.extend([df.date.iloc[2], df.date.iloc[2], df.date.iloc[6], df.date.iloc[6], df.date.iloc[6]])
# synthetic data for earthquakes
quakes['date'] = dates
quakes['magnitude'] = [np.random.uniform(5,7) for obs in quakes.date]
quakes = pd.concat([quakes, quakes.groupby('date').cumcount().to_frame('dupes')], axis = 1)
# find dates with multiple quakes
multiple = quakes[quakes.date.duplicated()].sort_values('date').reset_index()#.sorted()
# find dates where only one quake occurs (to keep number of traces at a minimum)
single = quakes[~quakes.date.duplicated()].sort_values('date').reset_index()
single = pd.merge(df, single, on = 'date', how = 'right')
fig = go.Figure(go.Scatter(x = df['date'], y = df['AAPL'], name = 'Apple'))
fig.add_traces(go.Scatter(x=single['date'], y =single['AAPL'],
mode = 'markers',
name = 'days with quakes',
showlegend = True,
marker = dict(symbol = 'square', size = single['magnitude']**2)))
symbols = cycle(['circle', 'hexagon', 'diamond', 'star'])
annotations = []
for i, r in multiple.iterrows():
fig.add_traces(go.Scatter(x=[r['date']], y = df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10),
mode = 'markers',
name = r['date'],
marker = dict(symbol = next(symbols), size = r['magnitude']**2)))
annotations.append([r['date'], df[df['date']==r['date']]['AAPL']*(1 + r['dupes']/10), r['magnitude']])
# annotate single events
for i, q in enumerate(fig.data[1].x):
fig.add_annotation(x=q, y = fig.data[1].y[i],
text = str(fig.data[1].y[i])[:3], showarrow = False,
font = dict(size = 10),
yref = 'y',
ay=0)
# annotate duplicates
for a in annotations:
fig.add_annotation(x=a[0], y = a[1].item(),
text = str(a[2])[:4], showarrow = False,
font = dict(size = 10),
yref = 'y',
ay=0)
fig.show()