日期时间的列操作 Pandas
Column Manipulations with date-Time Pandas
我正在尝试同时对行和列进行一些列操作,包括 Pandas 中的日期和时间序列。传统上没有系列 python 词典很棒。但是 Pandas 这对我来说是一件新鲜事。
输入文件:N 个。
File1.csv, File2.csv, File3.csv, ........... Filen.csv
Ids,Date-time-1 Ids,Date-time-2 Ids,Date-time-1
56,4568 645,5545 25,54165
45,464 458,546
我正在尝试将所有文件的 Date-time
列合并到一个关于 Ids
的大数据文件中
Ids,Date-time-ref,Date-time-1,date-time-2
56,100,4468,NAN
45,150,314,NAN
645,50,NAN,5495
458,200,NAN,346
25,250,53915,NAN
检查 date-time
列 - 如果不匹配,创建一个,然后用 date-time value
的值减去当前 date-time value
的值,然后填充相对于 Ids
的值date-time-ref
对应 Ids
.
用NAN
填充空位,如果下一个文件有那个值则用NAN
替换新值
如果它是直列减法,它非常容易,但与 date-time series
同步并且相对于 Ids
似乎有点混乱。
首先感谢一些建议。提前致谢。
这是一种方法。
import pandas as pd
import numpy as np
from StringIO import StringIO
# your csv file contents
csv_file1 = 'Ids,Date-time-1\n56,4568\n45,464\n'
csv_file2 = 'Ids,Date-time-2\n645,5545\n458,546\n'
# add a duplicated Ids record for testing purpose
csv_file3 = 'Ids,Date-time-1\n25,54165\n645, 4354\n'
csv_file_all = [csv_file1, csv_file2, csv_file3]
# read csv into df using list comprehension
# I use buffer here, replace stringIO with your file path
df_all = [pd.read_csv(StringIO(csv_file)) for csv_file in csv_file_all]
# processing
# =====================================================
# concat along axis=0, outer join on axis=1
merged = pd.concat(df_all, axis=0, ignore_index=True, join='outer').set_index('Ids')
Out[206]:
Date-time-1 Date-time-2
Ids
56 4568 NaN
45 464 NaN
645 NaN 5545
458 NaN 546
25 54165 NaN
645 4354 NaN
# custom function to handle/merge duplicates on Ids (axis=0)
def apply_func(group):
return group.fillna(method='ffill').iloc[-1]
# remove Ids duplicates
merged_unique = merged.groupby(level='Ids').apply(apply_func)
Out[207]:
Date-time-1 Date-time-2
Ids
25 54165 NaN
45 464 NaN
56 4568 NaN
458 NaN 546
645 4354 5545
# do the subtraction
master_csv_file = 'Ids,Date-time-ref\n56,100\n45,150\n645,50\n458,200\n25,250\n'
df_master = pd.read_csv(io.StringIO(master_csv_file), index_col=['Ids']).sort_index()
# select matching records and horizontal concat
df_matched = pd.concat([df_master,merged_unique.reindex(df_master.index)], axis=1)
# use broadcasting
df_matched.iloc[:, 1:] = df_matched.iloc[:, 1:].sub(df_matched.iloc[:, 0], axis=0)
Out[208]:
Date-time-ref Date-time-1 Date-time-2
Ids
25 250 53915 NaN
45 150 314 NaN
56 100 4468 NaN
458 200 NaN 346
645 50 4304 5495
我正在尝试同时对行和列进行一些列操作,包括 Pandas 中的日期和时间序列。传统上没有系列 python 词典很棒。但是 Pandas 这对我来说是一件新鲜事。
输入文件:N 个。
File1.csv, File2.csv, File3.csv, ........... Filen.csv
Ids,Date-time-1 Ids,Date-time-2 Ids,Date-time-1
56,4568 645,5545 25,54165
45,464 458,546
我正在尝试将所有文件的 Date-time
列合并到一个关于 Ids
Ids,Date-time-ref,Date-time-1,date-time-2
56,100,4468,NAN
45,150,314,NAN
645,50,NAN,5495
458,200,NAN,346
25,250,53915,NAN
检查
date-time
列 - 如果不匹配,创建一个,然后用date-time value
的值减去当前date-time value
的值,然后填充相对于Ids
的值date-time-ref
对应Ids
.用
NAN
填充空位,如果下一个文件有那个值则用NAN
替换新值
如果它是直列减法,它非常容易,但与 date-time series
同步并且相对于 Ids
似乎有点混乱。
首先感谢一些建议。提前致谢。
这是一种方法。
import pandas as pd
import numpy as np
from StringIO import StringIO
# your csv file contents
csv_file1 = 'Ids,Date-time-1\n56,4568\n45,464\n'
csv_file2 = 'Ids,Date-time-2\n645,5545\n458,546\n'
# add a duplicated Ids record for testing purpose
csv_file3 = 'Ids,Date-time-1\n25,54165\n645, 4354\n'
csv_file_all = [csv_file1, csv_file2, csv_file3]
# read csv into df using list comprehension
# I use buffer here, replace stringIO with your file path
df_all = [pd.read_csv(StringIO(csv_file)) for csv_file in csv_file_all]
# processing
# =====================================================
# concat along axis=0, outer join on axis=1
merged = pd.concat(df_all, axis=0, ignore_index=True, join='outer').set_index('Ids')
Out[206]:
Date-time-1 Date-time-2
Ids
56 4568 NaN
45 464 NaN
645 NaN 5545
458 NaN 546
25 54165 NaN
645 4354 NaN
# custom function to handle/merge duplicates on Ids (axis=0)
def apply_func(group):
return group.fillna(method='ffill').iloc[-1]
# remove Ids duplicates
merged_unique = merged.groupby(level='Ids').apply(apply_func)
Out[207]:
Date-time-1 Date-time-2
Ids
25 54165 NaN
45 464 NaN
56 4568 NaN
458 NaN 546
645 4354 5545
# do the subtraction
master_csv_file = 'Ids,Date-time-ref\n56,100\n45,150\n645,50\n458,200\n25,250\n'
df_master = pd.read_csv(io.StringIO(master_csv_file), index_col=['Ids']).sort_index()
# select matching records and horizontal concat
df_matched = pd.concat([df_master,merged_unique.reindex(df_master.index)], axis=1)
# use broadcasting
df_matched.iloc[:, 1:] = df_matched.iloc[:, 1:].sub(df_matched.iloc[:, 0], axis=0)
Out[208]:
Date-time-ref Date-time-1 Date-time-2
Ids
25 250 53915 NaN
45 150 314 NaN
56 100 4468 NaN
458 200 NaN 346
645 50 4304 5495