您如何获取一个涵盖多年的数据框,并将其分解为每年的单独 DF
How do you take one dataframe that covers multiple years, and break it into a separate DF for each year
我已经查看了 SE,但找不到我的问题的答案。我还是个新手。
我正在尝试获取采购 csv 文件并将其分解为每年的单独数据框。
例如,如果我有一个 MM/DD/YYYY 格式的完整日期列表,我试图将它们分成每年的数据框。比如Ord2015、Ord2014等...
我试图将完整的日期隐藏到年份中,还尝试使用切片只查看日期的最后四个,但无济于事。
这是我目前的(不完整的)尝试:
import pandas as pd
import csv
import numpy as np
import datetime as dt
import re
purch1 = pd.read_csv('purchases.csv')
#Remove unneeded fluff
del_colmn = ['pid', 'notes', 'warehouse_id', 'env_notes', 'budget_notes']
purch1 = purch1.drop(del_colmn, 1)
#break down by year only
purch1.sort_values(by=['order_date'])
Ord2015 = ()
Ord2014 = ()
for purch in purch1:
Order2015.add(purch1['order_date'] == 2015)
根据@anon01 的要求...这是您给我的代码的结果运行。我只用了四个样本,因为那是我最初玩的所有...唱片有将近 20k 行,所以我只抽出一些来玩。
'{"pid":{"0":75,"2":95,"3":117,"1":82},"env_id":{"0": 12454,"2":12532,"3":12623,"1":12511},"ord_date":{"0":"10\/2\/2014","2":"11 \/22\/2014","3":"2\/17\/2015","1":"11\/8\/2014"},"cost_center":{"0": "Ops","2":"Cons","3":"Net","1":"Net"},"dept":{"0":"Ops","2":"Cons", "3":"Ops","1":"Ops"},"signing_mgr":{"0":"M.Dodd","2":"L.Price","3": "M. Dodd","1":"M. Dodd"},"check_num":{"0":null,"2":null,"3":null,"1":82301.0} ,"rec_date":{"0":"10\/11\/2014","2":"12\/2\/2014","3":"3\/1\/2015 ","1":"11\/20\/2014"},"模型":{"0":null,"2":null,"3":null,"1":null},"notes" :{"0":"Shipped to east WH","2":"Rec'd by L.Price","3":"Shipped to Client (1190)","1":"Rec'd由 K. Wilson"},"env_notes":{"0":"appr by K.Polt","2":"appr by S. Crane","3":"appr by K.Polt","1":"appr by K.Polt"},"budget_notes":{"0":null,"2":"OOB 费用","3":"客户账单","1":null},"cost_year":{"0":2014.0,"2":2015.0,"3":null,"1":2014.0}}'
您可以将 parse_dates
添加到 read_csv
以将列转换为日期时间,然后创建 DataFrames 字典 dfs
,用于选择 key
s:
purch1 = pd.read_csv('purchases.csv', parse_dates=['ord_date'])
dfs = dict(tuple(purch1.groupby(df['ord_date'].dt.year)))
Ord2015 = dfs[2015]
Ord2016 = dfs[2016]
不是,但可以按年份组创建 DataFrames:
for i, g in df.groupby(purch1['ord_date'].dt.year):
globals()['Ord' + str(i)] = g
我已经查看了 SE,但找不到我的问题的答案。我还是个新手。
我正在尝试获取采购 csv 文件并将其分解为每年的单独数据框。
例如,如果我有一个 MM/DD/YYYY 格式的完整日期列表,我试图将它们分成每年的数据框。比如Ord2015、Ord2014等...
我试图将完整的日期隐藏到年份中,还尝试使用切片只查看日期的最后四个,但无济于事。
这是我目前的(不完整的)尝试:
import pandas as pd
import csv
import numpy as np
import datetime as dt
import re
purch1 = pd.read_csv('purchases.csv')
#Remove unneeded fluff
del_colmn = ['pid', 'notes', 'warehouse_id', 'env_notes', 'budget_notes']
purch1 = purch1.drop(del_colmn, 1)
#break down by year only
purch1.sort_values(by=['order_date'])
Ord2015 = ()
Ord2014 = ()
for purch in purch1:
Order2015.add(purch1['order_date'] == 2015)
根据@anon01 的要求...这是您给我的代码的结果运行。我只用了四个样本,因为那是我最初玩的所有...唱片有将近 20k 行,所以我只抽出一些来玩。
'{"pid":{"0":75,"2":95,"3":117,"1":82},"env_id":{"0": 12454,"2":12532,"3":12623,"1":12511},"ord_date":{"0":"10\/2\/2014","2":"11 \/22\/2014","3":"2\/17\/2015","1":"11\/8\/2014"},"cost_center":{"0": "Ops","2":"Cons","3":"Net","1":"Net"},"dept":{"0":"Ops","2":"Cons", "3":"Ops","1":"Ops"},"signing_mgr":{"0":"M.Dodd","2":"L.Price","3": "M. Dodd","1":"M. Dodd"},"check_num":{"0":null,"2":null,"3":null,"1":82301.0} ,"rec_date":{"0":"10\/11\/2014","2":"12\/2\/2014","3":"3\/1\/2015 ","1":"11\/20\/2014"},"模型":{"0":null,"2":null,"3":null,"1":null},"notes" :{"0":"Shipped to east WH","2":"Rec'd by L.Price","3":"Shipped to Client (1190)","1":"Rec'd由 K. Wilson"},"env_notes":{"0":"appr by K.Polt","2":"appr by S. Crane","3":"appr by K.Polt","1":"appr by K.Polt"},"budget_notes":{"0":null,"2":"OOB 费用","3":"客户账单","1":null},"cost_year":{"0":2014.0,"2":2015.0,"3":null,"1":2014.0}}'
您可以将 parse_dates
添加到 read_csv
以将列转换为日期时间,然后创建 DataFrames 字典 dfs
,用于选择 key
s:
purch1 = pd.read_csv('purchases.csv', parse_dates=['ord_date'])
dfs = dict(tuple(purch1.groupby(df['ord_date'].dt.year)))
Ord2015 = dfs[2015]
Ord2016 = dfs[2016]
不是
for i, g in df.groupby(purch1['ord_date'].dt.year):
globals()['Ord' + str(i)] = g