如何使用 plotly-express 合并两个 Dataframes 并将数据绘制成一条线
How to combine two Dataframes and plot data as one line using plotly-express
我正在尝试绘制两个数据集的折线图,每个数据集都包含值 'date'
、'conversions'
和 'cpi'
作为一条线。该图表显示了每天的值计数,因此我在处理同一天正确绘制每个数据集的值时遇到了问题:
示例数据:
date conversions cpi
0 2020-11-02 0.0 0.000000e+00
1 2020-11-03 0.0 0.000000e+00
2 2020-11-04 0.0 0.000000e+00
3 2020-11-05 0.0 0.000000e+00
4 2020-11-06 3.0 6.333000e-01
5 2020-11-07 0.0 0.000000e+00
6 2020-11-08 0.0 0.000000e+00
7 2020-11-09 0.0 0.000000e+00
8 2020-11-10 0.0 0.000000e+00
9 2020-11-11 2.0 1.695000e+00
10 2020-11-12 0.0 0.000000e+00
11 2020-11-13 2.0 2.170000e+00
12 2020-11-14 0.0 0.000000e+00
13 2020-11-15 1.0 2.590000e+00
14 2020-11-16 2.0 2.670000e+00
0 2020-11-02 5.0 2.039435e+06
1 2020-11-03 6.0 2.788452e+06
2 2020-11-04 8.0 1.720630e+06
3 2020-11-05 8.0 2.038703e+06
4 2020-11-06 11.0 1.775534e+06
5 2020-11-07 14.0 1.810215e+06
6 2020-11-08 30.0 1.617934e+06
7 2020-11-09 27.0 1.784663e+06
8 2020-11-10 32.0 1.368291e+06
9 2020-11-11 4.0 5.293594e+06
10 2020-11-12 17.0 1.524248e+06
11 2020-11-13 20.0 2.437085e+06
12 2020-11-14 24.0 2.272977e+06
13 2020-11-15 38.0 1.848160e+06
14 2020-11-16 22.0 2.415721e+06
我的代码是:
asa_installs_time = get_installs_time(start_date, end_date)
ga_installs_time = get_GAinstalls_time(start_date, end_date)
asa_installsTime_df = pd.DataFrame.from_dict(asa_installs_time[1])
ga_installsTime_df = pd.DataFrame.from_dict(ga_installs_time)
all_installsTime_df = pd.concat([ga_installsTime_df, asa_installsTime_df])
installs_time_series_chart = px.line( all_installsTime_df, x= all_installsTime_df['date'], all_installsTime_df['conversions'], title='Installs per Day')
return [all_installsTime_df]
如何解决绘制两个相同日期的问题?
编辑
使用all_installsTime_df = all_installsTime_df.sort_values('date').reset_index(drop=True)
:
- 主要问题是需要
.groupby
'date'
和 .sum()
来自两个数据帧的值。
import pandas as pd
import plotly.express as px
# sample data
data1 = {'date': ['2020-11-02', '2020-11-03', '2020-11-04', '2020-11-05', '2020-11-06', '2020-11-07', '2020-11-08', '2020-11-09', '2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16'], 'conversions': [0, 0, 0, 0, 3, 0, 0, 0, 0, 2, 0, 2, 0, 1, 2], 'cpi': [0.0, 0.0, 0.0, 0.0, 0.6333, 0.0, 0.0, 0.0, 0.0, 1.695, 0.0, 2.17, 0.0, 2.59, 2.67]}
data2 = {'date': ['2020-11-02', '2020-11-03', '2020-11-04', '2020-11-05', '2020-11-06', '2020-11-07', '2020-11-08', '2020-11-09', '2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16'], 'conversions': [5.0, 6.0, 8.0, 8.0, 11.0, 14.0, 30.0, 27.0, 32.0, 4.0, 17.0, 20.0, 24.0, 38.0, 22.0], 'cpi': [2039435.0, 2788452.0, 1720630.0, 2038703.0, 1775534.0, 1810215.0, 1617934.0, 1784663.0, 1368291.0, 5293594.0, 1524248.0, 2437085.0, 2272977.0, 1848160.0, 2415721.0]}
# create dataframes
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# concat the dataframes
df = pd.concat([df1, df2]).reset_index(drop=True)
# set the date column as a datetime
df.date = pd.to_datetime(df.date)
# groupby date, aggregate sum on all columns and reset
dfg = df.groupby('date').sum().reset_index()
# plot
fig = px.line(dfg, x=dfg['date'], y=dfg['conversions'], title='Installs per Day')
fig.show()
display(dfg)
date conversions cpi
0 2020-11-02 5.0 2.039435e+06
1 2020-11-03 6.0 2.788452e+06
2 2020-11-04 8.0 1.720630e+06
3 2020-11-05 8.0 2.038703e+06
4 2020-11-06 14.0 1.775535e+06
5 2020-11-07 14.0 1.810215e+06
6 2020-11-08 30.0 1.617934e+06
7 2020-11-09 27.0 1.784663e+06
8 2020-11-10 32.0 1.368291e+06
9 2020-11-11 6.0 5.293596e+06
10 2020-11-12 17.0 1.524248e+06
11 2020-11-13 22.0 2.437087e+06
12 2020-11-14 24.0 2.272977e+06
13 2020-11-15 39.0 1.848163e+06
14 2020-11-16 24.0 2.415724e+06
我正在尝试绘制两个数据集的折线图,每个数据集都包含值 'date'
、'conversions'
和 'cpi'
作为一条线。该图表显示了每天的值计数,因此我在处理同一天正确绘制每个数据集的值时遇到了问题:
示例数据:
date conversions cpi
0 2020-11-02 0.0 0.000000e+00
1 2020-11-03 0.0 0.000000e+00
2 2020-11-04 0.0 0.000000e+00
3 2020-11-05 0.0 0.000000e+00
4 2020-11-06 3.0 6.333000e-01
5 2020-11-07 0.0 0.000000e+00
6 2020-11-08 0.0 0.000000e+00
7 2020-11-09 0.0 0.000000e+00
8 2020-11-10 0.0 0.000000e+00
9 2020-11-11 2.0 1.695000e+00
10 2020-11-12 0.0 0.000000e+00
11 2020-11-13 2.0 2.170000e+00
12 2020-11-14 0.0 0.000000e+00
13 2020-11-15 1.0 2.590000e+00
14 2020-11-16 2.0 2.670000e+00
0 2020-11-02 5.0 2.039435e+06
1 2020-11-03 6.0 2.788452e+06
2 2020-11-04 8.0 1.720630e+06
3 2020-11-05 8.0 2.038703e+06
4 2020-11-06 11.0 1.775534e+06
5 2020-11-07 14.0 1.810215e+06
6 2020-11-08 30.0 1.617934e+06
7 2020-11-09 27.0 1.784663e+06
8 2020-11-10 32.0 1.368291e+06
9 2020-11-11 4.0 5.293594e+06
10 2020-11-12 17.0 1.524248e+06
11 2020-11-13 20.0 2.437085e+06
12 2020-11-14 24.0 2.272977e+06
13 2020-11-15 38.0 1.848160e+06
14 2020-11-16 22.0 2.415721e+06
我的代码是:
asa_installs_time = get_installs_time(start_date, end_date)
ga_installs_time = get_GAinstalls_time(start_date, end_date)
asa_installsTime_df = pd.DataFrame.from_dict(asa_installs_time[1])
ga_installsTime_df = pd.DataFrame.from_dict(ga_installs_time)
all_installsTime_df = pd.concat([ga_installsTime_df, asa_installsTime_df])
installs_time_series_chart = px.line( all_installsTime_df, x= all_installsTime_df['date'], all_installsTime_df['conversions'], title='Installs per Day')
return [all_installsTime_df]
如何解决绘制两个相同日期的问题?
编辑
使用all_installsTime_df = all_installsTime_df.sort_values('date').reset_index(drop=True)
:
- 主要问题是需要
.groupby
'date'
和.sum()
来自两个数据帧的值。
import pandas as pd
import plotly.express as px
# sample data
data1 = {'date': ['2020-11-02', '2020-11-03', '2020-11-04', '2020-11-05', '2020-11-06', '2020-11-07', '2020-11-08', '2020-11-09', '2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16'], 'conversions': [0, 0, 0, 0, 3, 0, 0, 0, 0, 2, 0, 2, 0, 1, 2], 'cpi': [0.0, 0.0, 0.0, 0.0, 0.6333, 0.0, 0.0, 0.0, 0.0, 1.695, 0.0, 2.17, 0.0, 2.59, 2.67]}
data2 = {'date': ['2020-11-02', '2020-11-03', '2020-11-04', '2020-11-05', '2020-11-06', '2020-11-07', '2020-11-08', '2020-11-09', '2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16'], 'conversions': [5.0, 6.0, 8.0, 8.0, 11.0, 14.0, 30.0, 27.0, 32.0, 4.0, 17.0, 20.0, 24.0, 38.0, 22.0], 'cpi': [2039435.0, 2788452.0, 1720630.0, 2038703.0, 1775534.0, 1810215.0, 1617934.0, 1784663.0, 1368291.0, 5293594.0, 1524248.0, 2437085.0, 2272977.0, 1848160.0, 2415721.0]}
# create dataframes
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# concat the dataframes
df = pd.concat([df1, df2]).reset_index(drop=True)
# set the date column as a datetime
df.date = pd.to_datetime(df.date)
# groupby date, aggregate sum on all columns and reset
dfg = df.groupby('date').sum().reset_index()
# plot
fig = px.line(dfg, x=dfg['date'], y=dfg['conversions'], title='Installs per Day')
fig.show()
display(dfg)
date conversions cpi
0 2020-11-02 5.0 2.039435e+06
1 2020-11-03 6.0 2.788452e+06
2 2020-11-04 8.0 1.720630e+06
3 2020-11-05 8.0 2.038703e+06
4 2020-11-06 14.0 1.775535e+06
5 2020-11-07 14.0 1.810215e+06
6 2020-11-08 30.0 1.617934e+06
7 2020-11-09 27.0 1.784663e+06
8 2020-11-10 32.0 1.368291e+06
9 2020-11-11 6.0 5.293596e+06
10 2020-11-12 17.0 1.524248e+06
11 2020-11-13 22.0 2.437087e+06
12 2020-11-14 24.0 2.272977e+06
13 2020-11-15 39.0 1.848163e+06
14 2020-11-16 24.0 2.415724e+06