如何将列添加到 CSV 文件中
How to add a column into a CSV file
我需要在df_canada
中添加一列来计算AreaName
在加拿大的移民总数:
df_canada = pd.read_csv('https://raw.githubusercontent.com/iikotelnikov/datasets/main/canada_immigration.csv', sep=';')
df_canada
首先,我添加了一个额外的行来计算加拿大每年的移民总数。
# Here we add cell for sum of immigrants in Canada by year
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as ani
import datetime as dt
%matplotlib inline
df_canada.loc[197] = {'Type': 'Sum of immigrants in Canada by year'}
df_canada.loc[197, 10:] = df_canada[df_canada['Type'] != 'Sum of immigrants in Canada by year'].iloc[:, 10:].sum()
df_canada
其次,我需要通过AreaName
来计算加拿大移民总数。
# Here we add cell for sum of immigrants in Canada by Area
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as ani
import datetime as dt
%matplotlib inline
df_canada.loc[198] = {'Type': 'Sum of immigrants in Canada by Area'}
df_canada.loc[198, 10:] = df_canada[df_canada['Type'] != 'Sum of immigrants in Canada by year'].iloc[:, 10:].sum()
但是不适合我
我不知道下一步是什么。
你能告诉我如何按地区计算加拿大的移民总数并以此数创建列吗?
你需要这样做!
- 导入dataframe中的数据
- 由于有些值是行而不是单列,您需要使用 melt 将行转换为列
- 对每一行应用区域名称和年份和总和的groupby
- 将输出加载到文件。
代码从这里开始 ->
df_canada = pd.read_csv('https://raw.githubusercontent.com/iikotelnikov/datasets/main/canada_immigration.csv', sep=';')
agg_df_candaa = df_canada.melt(id_vars=['Type', 'Coverage', 'OdName', 'AREA', 'AreaName', 'REG', 'RegName',
'DEV', 'DevName'], var_name='year', value_name="value")
result_df = agg_df_candaa.groupby(['AreaName','year'])['value'].sum()
result_df.to_csv('Your_location/result_df.csv')
我需要在df_canada
中添加一列来计算AreaName
在加拿大的移民总数:
df_canada = pd.read_csv('https://raw.githubusercontent.com/iikotelnikov/datasets/main/canada_immigration.csv', sep=';')
df_canada
首先,我添加了一个额外的行来计算加拿大每年的移民总数。
# Here we add cell for sum of immigrants in Canada by year
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as ani
import datetime as dt
%matplotlib inline
df_canada.loc[197] = {'Type': 'Sum of immigrants in Canada by year'}
df_canada.loc[197, 10:] = df_canada[df_canada['Type'] != 'Sum of immigrants in Canada by year'].iloc[:, 10:].sum()
df_canada
其次,我需要通过AreaName
来计算加拿大移民总数。
# Here we add cell for sum of immigrants in Canada by Area
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as ani
import datetime as dt
%matplotlib inline
df_canada.loc[198] = {'Type': 'Sum of immigrants in Canada by Area'}
df_canada.loc[198, 10:] = df_canada[df_canada['Type'] != 'Sum of immigrants in Canada by year'].iloc[:, 10:].sum()
但是不适合我
我不知道下一步是什么。
你能告诉我如何按地区计算加拿大的移民总数并以此数创建列吗?
你需要这样做!
- 导入dataframe中的数据
- 由于有些值是行而不是单列,您需要使用 melt 将行转换为列
- 对每一行应用区域名称和年份和总和的groupby
- 将输出加载到文件。
代码从这里开始 ->
df_canada = pd.read_csv('https://raw.githubusercontent.com/iikotelnikov/datasets/main/canada_immigration.csv', sep=';')
agg_df_candaa = df_canada.melt(id_vars=['Type', 'Coverage', 'OdName', 'AREA', 'AreaName', 'REG', 'RegName',
'DEV', 'DevName'], var_name='year', value_name="value")
result_df = agg_df_candaa.groupby(['AreaName','year'])['value'].sum()
result_df.to_csv('Your_location/result_df.csv')