Pandas groupby 仅使用年份和月份
Pandas groupby using only year and month
我有一个使用 Pandas 的 Python 程序,它读取两个数据帧,在以下链接中获得:
Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias:https://datamexico.org/es/profile/geo/san-nicolas-de-los-garza#covid19-evolucion
Denuncias-segun-bien-afectado-en-San-Nicolas-de-los-GarzaClic-en-el-grafico-para-seleccionar:https://datamexico.org/es/profile/geo/san-nicolas-de-los-garza#seguridad-publica-denuncias
我目前想要做的是在具有相同日期的“covid”数据框中进行分组,并对这些数据求和。无论如何,没有任何方法可以解决,它会定期打印一个错误,指示我应该使用“PeriodIndex”的语法。有人有建议或解决方案吗?提前致谢。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
#csv for the covid cases
covid = pd.read_csv('Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias.csv')
#csv for complaints
comp = pd.read_csv('Denuncias-segun-bien-afectado-en-San-Nicolas-de-los-GarzaClic-en-el-grafico-para-seleccionar.csv')
#cleaning data in both dataframes
#keeping only the relevant columns
covid = covid[['Month','Daily Cases']]
comp = comp[['Month','Affected Legal Good', 'Value']]
#changing the labels from spanish to english
comp['Affected Legal Good'].replace({'Patrimonio': 'Heritage', 'Familia':'Family', 'Libertad y Seguridad Sexual':'Sexual Freedom and Safety', 'Sociedad':'Society', 'Vida e Integridad Corporal':'Life and Bodily Integrity', 'Libertad Personal':'Personal Freedom', 'Otros Bienes Jurídicos Afectados (Del Fuero Común)':'Other Affected Legal Assets (Common Jurisdiction)'}, inplace=True, regex=True)
#changing the month types to dates
covid['Month'] = pd.to_datetime(covid['Month'])
covid['Month'] = covid['Month'].dt.to_period('M')
covid
您可以简单地使用 group by statement.Timegrouper 默认情况下将其转换为 datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
#csv for the covid cases
covid = pd.read_csv('Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias.csv')
covid = covid.groupby(['Month'])['Daily Cases'].sum()
covid = covid.reset_index()
# #changing the month types to dates
covid['Month'] = pd.to_datetime(covid['Month'])
covid['Month'] = covid['Month'].dt.to_period('M')
covid
我有一个使用 Pandas 的 Python 程序,它读取两个数据帧,在以下链接中获得:
Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias:https://datamexico.org/es/profile/geo/san-nicolas-de-los-garza#covid19-evolucion
Denuncias-segun-bien-afectado-en-San-Nicolas-de-los-GarzaClic-en-el-grafico-para-seleccionar:https://datamexico.org/es/profile/geo/san-nicolas-de-los-garza#seguridad-publica-denuncias
我目前想要做的是在具有相同日期的“covid”数据框中进行分组,并对这些数据求和。无论如何,没有任何方法可以解决,它会定期打印一个错误,指示我应该使用“PeriodIndex”的语法。有人有建议或解决方案吗?提前致谢。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
#csv for the covid cases
covid = pd.read_csv('Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias.csv')
#csv for complaints
comp = pd.read_csv('Denuncias-segun-bien-afectado-en-San-Nicolas-de-los-GarzaClic-en-el-grafico-para-seleccionar.csv')
#cleaning data in both dataframes
#keeping only the relevant columns
covid = covid[['Month','Daily Cases']]
comp = comp[['Month','Affected Legal Good', 'Value']]
#changing the labels from spanish to english
comp['Affected Legal Good'].replace({'Patrimonio': 'Heritage', 'Familia':'Family', 'Libertad y Seguridad Sexual':'Sexual Freedom and Safety', 'Sociedad':'Society', 'Vida e Integridad Corporal':'Life and Bodily Integrity', 'Libertad Personal':'Personal Freedom', 'Otros Bienes Jurídicos Afectados (Del Fuero Común)':'Other Affected Legal Assets (Common Jurisdiction)'}, inplace=True, regex=True)
#changing the month types to dates
covid['Month'] = pd.to_datetime(covid['Month'])
covid['Month'] = covid['Month'].dt.to_period('M')
covid
您可以简单地使用 group by statement.Timegrouper 默认情况下将其转换为 datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
#csv for the covid cases
covid = pd.read_csv('Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias.csv')
covid = covid.groupby(['Month'])['Daily Cases'].sum()
covid = covid.reset_index()
# #changing the month types to dates
covid['Month'] = pd.to_datetime(covid['Month'])
covid['Month'] = covid['Month'].dt.to_period('M')
covid