Python 列表拆分,按日期排序,然后加入
Python List splitting, sorting by date, then joining
好吧,我已经在这里待了几个小时了,我承认失败并请求您的怜悯。
目标:我有多个文件(银行对账单下载),我想
合并、排序、删除重复项。
下载格式如下:
"08/04/2015","Balance","5,804.30","Current Balance for account 123S14"
"08/04/2015","Balance","5,804.30","Available Balance for account 123S14"
"02/03/2015","241.25","Transaction description","2,620.09"
"02/03/2015","-155.49","Transaction description","2,464.60"
"03/03/2015","82.00","Transaction description","2,546.60"
"03/03/2015","243.25","Transaction description","2,789.85"
"03/03/2015","-334.81","Transaction description","2,339.12"
"04/03/2015","-25.05","Transaction description","2,314.07"
除了完全不知道我在做什么之外,我的主要问题之一是数值包含逗号。我已经成功地编写了代码,将 'buried' 逗号去掉,然后去掉引号,这样我就有了一个 CSV...行。
所以我现在有了这种格式的数据
['02/03/2015', ' \t ', '241.25\t ', ' \t ', 'Transaction Details\n', '02/03/2015', ' \t ', ' \t ', '-155.49\t ', 'Transaction Details\n', '03/03/2015', ' \t ', '82.00\t ', ' \t ', 'Transaction Details\n', '03/03/2015', ' \t ', '243.25\t ', ' \t ', 'Transaction Details\n', '02/03/2015', ' \t ', '241.25\t ', ' \t ', 'Transaction Details\n']
我相信这使得它几乎准备好首先对元素进行排序,但我认为它现在是一个长列表,而不是列表列表。
我研究了 sorts 并找到了 lambda... 函数,所以我开始实现
new_file_data = sorted(new_file_data, key=lambda item: item[0])
但元素 [0] 只是 BOL 处的 "。
我还注意到我需要指示日期格式可能不正确,这让我想到了这个结构:
sorted(new_file_data, key=lambda d: datetime.strptime(d, '%d/%m/%Y'))
我大致得到了 'map' 构造,但不知道如何组合,这样我就可以只引用元素 [0] 以及 how 来引用它(按日期计算)
现在我在这里,希望有人能帮助我跨过这个障碍?
我想我需要 [have] 更好地拆分列表,以便每一行都是一个元素 - 我在某个时候得到了一个排序结果,但所有字段都放在一起,值(排序)然后日期然后单词等
因此,如果有人可以就我失败的列表操作以及如何构造该 sort-lambda 提供一些建议。
感谢那些有时间并且知道如何回复此类入门查询的人。
您可以定义自己的排序函数。
混合使用这两个问题,您将得到您想要的(或接近的):
Custom Python list sorting
Python date string to date object
在您的排序函数中,将日期从字符串转换为日期时间并进行比较
def cmp_items(a, b):
datetime_a = datetime.datetime.strptime(a.[0], "%d/%m/%Y").date()
datetime_b = datetime.datetime.strptime(a.[0], "%d/%m/%Y").date()
if datetime_a > datetime_b:
return 1
elif datetime_a == datetime_b:
return 0
else:
return -1
然后,您只需使用它对列表进行排序
new_file_data = new_file_data.sort(cmp_items)
在那之后你仍然会有一个小问题,具有相同日期的元素将以类似随机的顺序排列。你可以改进比较功能来比较更多的东西来防止这种情况。
顺便说一句,你没有去掉隐藏的逗号,看来你已经完全去掉了最后一部分。
如果我没理解错的话,您想阅读 csv 的内容并按日期排序。
鉴于 data.csv
的内容
"08/04/2015","Balance","5,804.30","Current Balance for account 123S14"
"08/04/2015","Balance","5,804.30","Available Balance for account 123S14"
"02/03/2015","241.25","Transaction description","2,620.09"
"02/03/2015","-155.49","Transaction description","2,464.60"
"03/03/2015","82.00","Transaction description","2,546.60"
"03/03/2015","243.25","Transaction description","2,789.85"
"03/03/2015","-334.81","Transaction description","2,339.12"
"04/03/2015","-25.05","Transaction description","2,314.07"
我会用csv
-module读取数据。
import csv
with open('data.csv') as f:
data = [row for row in csv.reader(f)]
给出:
>>> data
[['08/04/2015', 'Balance', '5,804.30', 'Current Balance for account 123S14'],
['08/04/2015', 'Balance', '5,804.30', 'Available Balance for account 123S14'],
['02/03/2015', '241.25', 'Transaction description', '2,620.09'],
['02/03/2015', '-155.49', 'Transaction description', '2,464.60'],
['03/03/2015', '82.00', 'Transaction description', '2,546.60'],
['03/03/2015', '243.25', 'Transaction description', '2,789.85'],
['03/03/2015', '-334.81', 'Transaction description', '2,339.12'],
['04/03/2015', '-25.05', 'Transaction description', '2,314.07']]
然后你可以使用datetime
-module提供一个键来排序。
import datetime
sorted_data = sorted(data, key=lambda row: datetime.datetime.strptime(row[0], "%d/%m/%Y"))
给出:
>>> sorted_data
[['02/03/2015', '241.25', 'Transaction description', '2,620.09'],
['02/03/2015', '-155.49', 'Transaction description', '2,464.60'],
['03/03/2015', '82.00', 'Transaction description', '2,546.60'],
['03/03/2015', '243.25', 'Transaction description', '2,789.85'],
['03/03/2015', '-334.81', 'Transaction description', '2,339.12'],
['04/03/2015', '-25.05', 'Transaction description', '2,314.07'],
['08/04/2015', 'Balance', '5,804.30', 'Current Balance for account 123S14'],
['08/04/2015', 'Balance', '5,804.30', 'Available Balance for account 123S14']]
好吧,我已经在这里待了几个小时了,我承认失败并请求您的怜悯。
目标:我有多个文件(银行对账单下载),我想 合并、排序、删除重复项。
下载格式如下:
"08/04/2015","Balance","5,804.30","Current Balance for account 123S14"
"08/04/2015","Balance","5,804.30","Available Balance for account 123S14"
"02/03/2015","241.25","Transaction description","2,620.09"
"02/03/2015","-155.49","Transaction description","2,464.60"
"03/03/2015","82.00","Transaction description","2,546.60"
"03/03/2015","243.25","Transaction description","2,789.85"
"03/03/2015","-334.81","Transaction description","2,339.12"
"04/03/2015","-25.05","Transaction description","2,314.07"
除了完全不知道我在做什么之外,我的主要问题之一是数值包含逗号。我已经成功地编写了代码,将 'buried' 逗号去掉,然后去掉引号,这样我就有了一个 CSV...行。
所以我现在有了这种格式的数据
['02/03/2015', ' \t ', '241.25\t ', ' \t ', 'Transaction Details\n', '02/03/2015', ' \t ', ' \t ', '-155.49\t ', 'Transaction Details\n', '03/03/2015', ' \t ', '82.00\t ', ' \t ', 'Transaction Details\n', '03/03/2015', ' \t ', '243.25\t ', ' \t ', 'Transaction Details\n', '02/03/2015', ' \t ', '241.25\t ', ' \t ', 'Transaction Details\n']
我相信这使得它几乎准备好首先对元素进行排序,但我认为它现在是一个长列表,而不是列表列表。
我研究了 sorts 并找到了 lambda... 函数,所以我开始实现
new_file_data = sorted(new_file_data, key=lambda item: item[0])
但元素 [0] 只是 BOL 处的 "。
我还注意到我需要指示日期格式可能不正确,这让我想到了这个结构:
sorted(new_file_data, key=lambda d: datetime.strptime(d, '%d/%m/%Y'))
我大致得到了 'map' 构造,但不知道如何组合,这样我就可以只引用元素 [0] 以及 how 来引用它(按日期计算)
现在我在这里,希望有人能帮助我跨过这个障碍? 我想我需要 [have] 更好地拆分列表,以便每一行都是一个元素 - 我在某个时候得到了一个排序结果,但所有字段都放在一起,值(排序)然后日期然后单词等
因此,如果有人可以就我失败的列表操作以及如何构造该 sort-lambda 提供一些建议。
感谢那些有时间并且知道如何回复此类入门查询的人。
您可以定义自己的排序函数。
混合使用这两个问题,您将得到您想要的(或接近的):
Custom Python list sorting
Python date string to date object
在您的排序函数中,将日期从字符串转换为日期时间并进行比较
def cmp_items(a, b):
datetime_a = datetime.datetime.strptime(a.[0], "%d/%m/%Y").date()
datetime_b = datetime.datetime.strptime(a.[0], "%d/%m/%Y").date()
if datetime_a > datetime_b:
return 1
elif datetime_a == datetime_b:
return 0
else:
return -1
然后,您只需使用它对列表进行排序
new_file_data = new_file_data.sort(cmp_items)
在那之后你仍然会有一个小问题,具有相同日期的元素将以类似随机的顺序排列。你可以改进比较功能来比较更多的东西来防止这种情况。
顺便说一句,你没有去掉隐藏的逗号,看来你已经完全去掉了最后一部分。
如果我没理解错的话,您想阅读 csv 的内容并按日期排序。
鉴于 data.csv
"08/04/2015","Balance","5,804.30","Current Balance for account 123S14"
"08/04/2015","Balance","5,804.30","Available Balance for account 123S14"
"02/03/2015","241.25","Transaction description","2,620.09"
"02/03/2015","-155.49","Transaction description","2,464.60"
"03/03/2015","82.00","Transaction description","2,546.60"
"03/03/2015","243.25","Transaction description","2,789.85"
"03/03/2015","-334.81","Transaction description","2,339.12"
"04/03/2015","-25.05","Transaction description","2,314.07"
我会用csv
-module读取数据。
import csv
with open('data.csv') as f:
data = [row for row in csv.reader(f)]
给出:
>>> data
[['08/04/2015', 'Balance', '5,804.30', 'Current Balance for account 123S14'],
['08/04/2015', 'Balance', '5,804.30', 'Available Balance for account 123S14'],
['02/03/2015', '241.25', 'Transaction description', '2,620.09'],
['02/03/2015', '-155.49', 'Transaction description', '2,464.60'],
['03/03/2015', '82.00', 'Transaction description', '2,546.60'],
['03/03/2015', '243.25', 'Transaction description', '2,789.85'],
['03/03/2015', '-334.81', 'Transaction description', '2,339.12'],
['04/03/2015', '-25.05', 'Transaction description', '2,314.07']]
然后你可以使用datetime
-module提供一个键来排序。
import datetime
sorted_data = sorted(data, key=lambda row: datetime.datetime.strptime(row[0], "%d/%m/%Y"))
给出:
>>> sorted_data
[['02/03/2015', '241.25', 'Transaction description', '2,620.09'],
['02/03/2015', '-155.49', 'Transaction description', '2,464.60'],
['03/03/2015', '82.00', 'Transaction description', '2,546.60'],
['03/03/2015', '243.25', 'Transaction description', '2,789.85'],
['03/03/2015', '-334.81', 'Transaction description', '2,339.12'],
['04/03/2015', '-25.05', 'Transaction description', '2,314.07'],
['08/04/2015', 'Balance', '5,804.30', 'Current Balance for account 123S14'],
['08/04/2015', 'Balance', '5,804.30', 'Available Balance for account 123S14']]