比较组合的开始和结束日期以确定排序
Comparing a combined start and end date to determine ordering
对于总是包含 start
和 end
日期的 Python 字典列表,您将如何根据“组合”start
和 end
日期?
根据以下标准从上到下获得最终结果的最简单(最 Pythonic)方法是什么:
- 先按
end_date
(降序)排序,然后按 start_date
(降序)排序。
- 如果有两个对象具有相同的
end_date
,则最新的 start_date
排在第一位,即:然后按 start_date
对这些项目进行排序。
- 如果
start_date
和 end_date
相同,则这些项目的顺序不是问题,可以忽略或保持原样。
import datetime
blah = [
{"id": 1, "start_date": datetime.date(2021, 5, 1), "end_date": None},
{"id": 2, "start_date": datetime.date(2013, 2, 1), "end_date": None},
{"id": 3, "start_date": datetime.date(2017, 1, 1), "end_date": datetime.date(2018, 1, 1)},
{"id": 4, "start_date": datetime.date(2016, 5, 1), "end_date": datetime.date(2019, 6, 1)},
{"id": 5, "start_date": datetime.date(2012, 1, 1), "end_date": datetime.date(2015, 1, 1)},
{"id": 6, "start_date": datetime.date(2008, 1, 1), "end_date": datetime.date(2011, 1, 1)},
{"id": 7, "start_date": datetime.date(2006, 1, 1), "end_date": datetime.date(2008, 1, 1)},
{"id": 8, "start_date": datetime.date(2005, 1, 15), "end_date": datetime.date(2010, 1, 15)},
{"id": 9, "start_date": datetime.date(2002, 1, 15), "end_date": datetime.date(2002, 1, 15)},
{"id": 10, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 11, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 12, "start_date": datetime.date(2001, 2, 1), "end_date": datetime.date(2003, 1, 1)},
{"id": 13, "start_date": datetime.date(2001, 1, 15), "end_date": datetime.date(2003, 1, 15)},
{"id": 14, "start_date": datetime.date(1998, 1, 1), "end_date": datetime.date(2001, 1, 1)},
{"id": 15, "start_date": datetime.date(1997, 1, 15), "end_date": datetime.date(1997, 1, 15)}
]
# Do something here...and return `result`.
result = [
{"id": 1, "start_date": datetime.date(2021, 5, 1), "end_date": None},
{"id": 2, "start_date": datetime.date(2013, 2, 1), "end_date": None},
{"id": 4, "start_date": datetime.date(2016, 5, 1), "end_date": datetime.date(2019, 6, 1)},
{"id": 3, "start_date": datetime.date(2017, 1, 1), "end_date": datetime.date(2018, 1, 1)},
{"id": 5, "start_date": datetime.date(2012, 1, 1), "end_date": datetime.date(2015, 1, 1)},
{"id": 6, "start_date": datetime.date(2008, 1, 1), "end_date": datetime.date(2011, 1, 1)},
{"id": 8, "start_date": datetime.date(2005, 1, 15), "end_date": datetime.date(2010, 1, 15)},
{"id": 7, "start_date": datetime.date(2006, 1, 1), "end_date": datetime.date(2008, 1, 1)},
{"id": 11, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 10, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 9, "start_date": datetime.date(2002, 1, 15), "end_date": datetime.date(2002, 1, 15)},
{"id": 12, "start_date": datetime.date(2001, 2, 1), "end_date": datetime.date(2003, 1, 1)},
{"id": 13, "start_date": datetime.date(2001, 1, 15), "end_date": datetime.date(2003, 1, 15)},
{"id": 14, "start_date": datetime.date(1998, 1, 1), "end_date": datetime.date(2001, 1, 1)},
{"id": 15, "start_date": datetime.date(1997, 1, 15), "end_date": datetime.date(1997, 1, 15)}
]
What would be the simplest (most Pythonic) way to obtain the end
result ...
我能想到的最简单(最 Pythonic)的方法是使用 pandas。
演示:
import datetime
import pandas as pd
blah = [
{"id": 1, "start_date": datetime.date(2021, 5, 1), "end_date": None},
{"id": 2, "start_date": datetime.date(2013, 2, 1), "end_date": None},
{"id": 3, "start_date": datetime.date(2017, 1, 1), "end_date": datetime.date(2018, 1, 1)},
{"id": 4, "start_date": datetime.date(2016, 5, 1), "end_date": datetime.date(2019, 6, 1)},
{"id": 5, "start_date": datetime.date(2012, 1, 1), "end_date": datetime.date(2015, 1, 1)},
{"id": 6, "start_date": datetime.date(2008, 1, 1), "end_date": datetime.date(2011, 1, 1)},
{"id": 7, "start_date": datetime.date(2006, 1, 1), "end_date": datetime.date(2008, 1, 1)},
{"id": 8, "start_date": datetime.date(2005, 1, 15), "end_date": datetime.date(2010, 1, 15)},
{"id": 9, "start_date": datetime.date(2002, 1, 15), "end_date": datetime.date(2002, 1, 15)},
{"id": 10, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 11, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 12, "start_date": datetime.date(2001, 2, 1), "end_date": datetime.date(2003, 1, 1)},
{"id": 13, "start_date": datetime.date(2001, 1, 15), "end_date": datetime.date(2003, 1, 15)},
{"id": 14, "start_date": datetime.date(1998, 1, 1), "end_date": datetime.date(2001, 1, 1)},
{"id": 15, "start_date": datetime.date(1997, 1, 15), "end_date": datetime.date(1997, 1, 15)}
]
df = pd.DataFrame(blah)
result = df.sort_values(['end_date', 'start_date'], ascending=(False, False), na_position='first').to_dict('records')
for e in result:
print(e)
输出:
{'id': 1, 'start_date': datetime.date(2021, 5, 1), 'end_date': None}
{'id': 2, 'start_date': datetime.date(2013, 2, 1), 'end_date': None}
{'id': 4, 'start_date': datetime.date(2016, 5, 1), 'end_date': datetime.date(2019, 6, 1)}
{'id': 3, 'start_date': datetime.date(2017, 1, 1), 'end_date': datetime.date(2018, 1, 1)}
{'id': 5, 'start_date': datetime.date(2012, 1, 1), 'end_date': datetime.date(2015, 1, 1)}
{'id': 6, 'start_date': datetime.date(2008, 1, 1), 'end_date': datetime.date(2011, 1, 1)}
{'id': 8, 'start_date': datetime.date(2005, 1, 15), 'end_date': datetime.date(2010, 1, 15)}
{'id': 7, 'start_date': datetime.date(2006, 1, 1), 'end_date': datetime.date(2008, 1, 1)}
{'id': 10, 'start_date': datetime.date(2002, 1, 1), 'end_date': datetime.date(2006, 1, 1)}
{'id': 11, 'start_date': datetime.date(2002, 1, 1), 'end_date': datetime.date(2006, 1, 1)}
{'id': 13, 'start_date': datetime.date(2001, 1, 15), 'end_date': datetime.date(2003, 1, 15)}
{'id': 12, 'start_date': datetime.date(2001, 2, 1), 'end_date': datetime.date(2003, 1, 1)}
{'id': 9, 'start_date': datetime.date(2002, 1, 15), 'end_date': datetime.date(2002, 1, 15)}
{'id': 14, 'start_date': datetime.date(1998, 1, 1), 'end_date': datetime.date(2001, 1, 1)}
{'id': 15, 'start_date': datetime.date(1997, 1, 15), 'end_date': datetime.date(1997, 1, 15)}
为了订购 end_date
我引入了一个“假日期”来使数据一致。此选择是任意的,但应避免与其他值发生冲突。内置函数 sorted
、reversed
需要一个具有同质数据的迭代器,所以没有 None
.
sorted
return 一个列表,reversed
一个生成器。
# In order to make sense the question 3. I modify the start date for 2006, see comment
blah = [
{"id": 1, "start_date": datetime.date(2021, 5, 1), "end_date": None},
{"id": 2, "start_date": datetime.date(2013, 2, 1), "end_date": None},
{"id": 3, "start_date": datetime.date(2017, 1, 1), "end_date": datetime.date(2018, 1, 1)},
{"id": 4, "start_date": datetime.date(2016, 5, 1), "end_date": datetime.date(2019, 6, 1)},
{"id": 5, "start_date": datetime.date(2012, 1, 1), "end_date": datetime.date(2015, 1, 1)},
{"id": 6, "start_date": datetime.date(2008, 1, 1), "end_date": datetime.date(2011, 1, 1)},
{"id": 7, "start_date": datetime.date(2006, 1, 1), "end_date": datetime.date(2008, 1, 1)},
{"id": 8, "start_date": datetime.date(2005, 1, 15), "end_date": datetime.date(2010, 1, 15)},
{"id": 9, "start_date": datetime.date(2002, 1, 15), "end_date": datetime.date(2002, 1, 15)},
{"id": 10, "start_date": datetime.date(2002, 1, 2), "end_date": datetime.date(2006, 1, 1)}, # <---- modified start_date!
{"id": 11, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 12, "start_date": datetime.date(2001, 2, 1), "end_date": datetime.date(2003, 1, 1)},
{"id": 13, "start_date": datetime.date(2001, 1, 15), "end_date": datetime.date(2003, 1, 15)},
{"id": 14, "start_date": datetime.date(1998, 1, 1), "end_date": datetime.date(2001, 1, 1)},
{"id": 15, "start_date": datetime.date(1997, 1, 15), "end_date": datetime.date(1997, 1, 15)}
]
这里是代码。
import itertools as it
import datetime
FAKE_DATE = datetime.date(2999, 9, 9) # or any non-interfering date
# 1
print(sorted(blah, key=lambda p: p['start_date']))
print(sorted(blah, reverse=True, key=lambda p: p['start_date'])) # reverse, A
print(list(reversed(sorted(blah, key=lambda p: p['start_date'])))) # reverse, B
# 2
order_2 = reversed(sorted(blah, key=lambda p: p['end_date'] if p['end_date'] is not None else FAKE_DATE))
print(list(order_2))
# 3
grp_by_end_dates = it.groupby(sorted(blah, key=lambda p: p['end_date'] if p['end_date'] is not None else FAKE_DATE), key=lambda p: p['end_date'])
order_3 = it.chain(*(sorted(list(i), reverse=True, key=lambda p: p['start_date']) for _, i in grp_by_end_dates))
print(list(order_3)
您可以简单地使用适当的关键字对数据进行排序以满足第 1-3 点;第 4 点自动满足,因为 Python 中的排序保证稳定:
result = sorted(blah,
reverse = True,
key=lambda d:(
d["end_date"] if d["end_date"] is not None else datetime.date(2999,12,31),
d["start_date"])
)
对于总是包含 start
和 end
日期的 Python 字典列表,您将如何根据“组合”start
和 end
日期?
根据以下标准从上到下获得最终结果的最简单(最 Pythonic)方法是什么:
- 先按
end_date
(降序)排序,然后按start_date
(降序)排序。 - 如果有两个对象具有相同的
end_date
,则最新的start_date
排在第一位,即:然后按start_date
对这些项目进行排序。 - 如果
start_date
和end_date
相同,则这些项目的顺序不是问题,可以忽略或保持原样。
import datetime
blah = [
{"id": 1, "start_date": datetime.date(2021, 5, 1), "end_date": None},
{"id": 2, "start_date": datetime.date(2013, 2, 1), "end_date": None},
{"id": 3, "start_date": datetime.date(2017, 1, 1), "end_date": datetime.date(2018, 1, 1)},
{"id": 4, "start_date": datetime.date(2016, 5, 1), "end_date": datetime.date(2019, 6, 1)},
{"id": 5, "start_date": datetime.date(2012, 1, 1), "end_date": datetime.date(2015, 1, 1)},
{"id": 6, "start_date": datetime.date(2008, 1, 1), "end_date": datetime.date(2011, 1, 1)},
{"id": 7, "start_date": datetime.date(2006, 1, 1), "end_date": datetime.date(2008, 1, 1)},
{"id": 8, "start_date": datetime.date(2005, 1, 15), "end_date": datetime.date(2010, 1, 15)},
{"id": 9, "start_date": datetime.date(2002, 1, 15), "end_date": datetime.date(2002, 1, 15)},
{"id": 10, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 11, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 12, "start_date": datetime.date(2001, 2, 1), "end_date": datetime.date(2003, 1, 1)},
{"id": 13, "start_date": datetime.date(2001, 1, 15), "end_date": datetime.date(2003, 1, 15)},
{"id": 14, "start_date": datetime.date(1998, 1, 1), "end_date": datetime.date(2001, 1, 1)},
{"id": 15, "start_date": datetime.date(1997, 1, 15), "end_date": datetime.date(1997, 1, 15)}
]
# Do something here...and return `result`.
result = [
{"id": 1, "start_date": datetime.date(2021, 5, 1), "end_date": None},
{"id": 2, "start_date": datetime.date(2013, 2, 1), "end_date": None},
{"id": 4, "start_date": datetime.date(2016, 5, 1), "end_date": datetime.date(2019, 6, 1)},
{"id": 3, "start_date": datetime.date(2017, 1, 1), "end_date": datetime.date(2018, 1, 1)},
{"id": 5, "start_date": datetime.date(2012, 1, 1), "end_date": datetime.date(2015, 1, 1)},
{"id": 6, "start_date": datetime.date(2008, 1, 1), "end_date": datetime.date(2011, 1, 1)},
{"id": 8, "start_date": datetime.date(2005, 1, 15), "end_date": datetime.date(2010, 1, 15)},
{"id": 7, "start_date": datetime.date(2006, 1, 1), "end_date": datetime.date(2008, 1, 1)},
{"id": 11, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 10, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 9, "start_date": datetime.date(2002, 1, 15), "end_date": datetime.date(2002, 1, 15)},
{"id": 12, "start_date": datetime.date(2001, 2, 1), "end_date": datetime.date(2003, 1, 1)},
{"id": 13, "start_date": datetime.date(2001, 1, 15), "end_date": datetime.date(2003, 1, 15)},
{"id": 14, "start_date": datetime.date(1998, 1, 1), "end_date": datetime.date(2001, 1, 1)},
{"id": 15, "start_date": datetime.date(1997, 1, 15), "end_date": datetime.date(1997, 1, 15)}
]
What would be the simplest (most Pythonic) way to obtain the end result ...
我能想到的最简单(最 Pythonic)的方法是使用 pandas。
演示:
import datetime
import pandas as pd
blah = [
{"id": 1, "start_date": datetime.date(2021, 5, 1), "end_date": None},
{"id": 2, "start_date": datetime.date(2013, 2, 1), "end_date": None},
{"id": 3, "start_date": datetime.date(2017, 1, 1), "end_date": datetime.date(2018, 1, 1)},
{"id": 4, "start_date": datetime.date(2016, 5, 1), "end_date": datetime.date(2019, 6, 1)},
{"id": 5, "start_date": datetime.date(2012, 1, 1), "end_date": datetime.date(2015, 1, 1)},
{"id": 6, "start_date": datetime.date(2008, 1, 1), "end_date": datetime.date(2011, 1, 1)},
{"id": 7, "start_date": datetime.date(2006, 1, 1), "end_date": datetime.date(2008, 1, 1)},
{"id": 8, "start_date": datetime.date(2005, 1, 15), "end_date": datetime.date(2010, 1, 15)},
{"id": 9, "start_date": datetime.date(2002, 1, 15), "end_date": datetime.date(2002, 1, 15)},
{"id": 10, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 11, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 12, "start_date": datetime.date(2001, 2, 1), "end_date": datetime.date(2003, 1, 1)},
{"id": 13, "start_date": datetime.date(2001, 1, 15), "end_date": datetime.date(2003, 1, 15)},
{"id": 14, "start_date": datetime.date(1998, 1, 1), "end_date": datetime.date(2001, 1, 1)},
{"id": 15, "start_date": datetime.date(1997, 1, 15), "end_date": datetime.date(1997, 1, 15)}
]
df = pd.DataFrame(blah)
result = df.sort_values(['end_date', 'start_date'], ascending=(False, False), na_position='first').to_dict('records')
for e in result:
print(e)
输出:
{'id': 1, 'start_date': datetime.date(2021, 5, 1), 'end_date': None}
{'id': 2, 'start_date': datetime.date(2013, 2, 1), 'end_date': None}
{'id': 4, 'start_date': datetime.date(2016, 5, 1), 'end_date': datetime.date(2019, 6, 1)}
{'id': 3, 'start_date': datetime.date(2017, 1, 1), 'end_date': datetime.date(2018, 1, 1)}
{'id': 5, 'start_date': datetime.date(2012, 1, 1), 'end_date': datetime.date(2015, 1, 1)}
{'id': 6, 'start_date': datetime.date(2008, 1, 1), 'end_date': datetime.date(2011, 1, 1)}
{'id': 8, 'start_date': datetime.date(2005, 1, 15), 'end_date': datetime.date(2010, 1, 15)}
{'id': 7, 'start_date': datetime.date(2006, 1, 1), 'end_date': datetime.date(2008, 1, 1)}
{'id': 10, 'start_date': datetime.date(2002, 1, 1), 'end_date': datetime.date(2006, 1, 1)}
{'id': 11, 'start_date': datetime.date(2002, 1, 1), 'end_date': datetime.date(2006, 1, 1)}
{'id': 13, 'start_date': datetime.date(2001, 1, 15), 'end_date': datetime.date(2003, 1, 15)}
{'id': 12, 'start_date': datetime.date(2001, 2, 1), 'end_date': datetime.date(2003, 1, 1)}
{'id': 9, 'start_date': datetime.date(2002, 1, 15), 'end_date': datetime.date(2002, 1, 15)}
{'id': 14, 'start_date': datetime.date(1998, 1, 1), 'end_date': datetime.date(2001, 1, 1)}
{'id': 15, 'start_date': datetime.date(1997, 1, 15), 'end_date': datetime.date(1997, 1, 15)}
为了订购 end_date
我引入了一个“假日期”来使数据一致。此选择是任意的,但应避免与其他值发生冲突。内置函数 sorted
、reversed
需要一个具有同质数据的迭代器,所以没有 None
.
sorted
return 一个列表,reversed
一个生成器。
# In order to make sense the question 3. I modify the start date for 2006, see comment
blah = [
{"id": 1, "start_date": datetime.date(2021, 5, 1), "end_date": None},
{"id": 2, "start_date": datetime.date(2013, 2, 1), "end_date": None},
{"id": 3, "start_date": datetime.date(2017, 1, 1), "end_date": datetime.date(2018, 1, 1)},
{"id": 4, "start_date": datetime.date(2016, 5, 1), "end_date": datetime.date(2019, 6, 1)},
{"id": 5, "start_date": datetime.date(2012, 1, 1), "end_date": datetime.date(2015, 1, 1)},
{"id": 6, "start_date": datetime.date(2008, 1, 1), "end_date": datetime.date(2011, 1, 1)},
{"id": 7, "start_date": datetime.date(2006, 1, 1), "end_date": datetime.date(2008, 1, 1)},
{"id": 8, "start_date": datetime.date(2005, 1, 15), "end_date": datetime.date(2010, 1, 15)},
{"id": 9, "start_date": datetime.date(2002, 1, 15), "end_date": datetime.date(2002, 1, 15)},
{"id": 10, "start_date": datetime.date(2002, 1, 2), "end_date": datetime.date(2006, 1, 1)}, # <---- modified start_date!
{"id": 11, "start_date": datetime.date(2002, 1, 1), "end_date": datetime.date(2006, 1, 1)},
{"id": 12, "start_date": datetime.date(2001, 2, 1), "end_date": datetime.date(2003, 1, 1)},
{"id": 13, "start_date": datetime.date(2001, 1, 15), "end_date": datetime.date(2003, 1, 15)},
{"id": 14, "start_date": datetime.date(1998, 1, 1), "end_date": datetime.date(2001, 1, 1)},
{"id": 15, "start_date": datetime.date(1997, 1, 15), "end_date": datetime.date(1997, 1, 15)}
]
这里是代码。
import itertools as it
import datetime
FAKE_DATE = datetime.date(2999, 9, 9) # or any non-interfering date
# 1
print(sorted(blah, key=lambda p: p['start_date']))
print(sorted(blah, reverse=True, key=lambda p: p['start_date'])) # reverse, A
print(list(reversed(sorted(blah, key=lambda p: p['start_date'])))) # reverse, B
# 2
order_2 = reversed(sorted(blah, key=lambda p: p['end_date'] if p['end_date'] is not None else FAKE_DATE))
print(list(order_2))
# 3
grp_by_end_dates = it.groupby(sorted(blah, key=lambda p: p['end_date'] if p['end_date'] is not None else FAKE_DATE), key=lambda p: p['end_date'])
order_3 = it.chain(*(sorted(list(i), reverse=True, key=lambda p: p['start_date']) for _, i in grp_by_end_dates))
print(list(order_3)
您可以简单地使用适当的关键字对数据进行排序以满足第 1-3 点;第 4 点自动满足,因为 Python 中的排序保证稳定:
result = sorted(blah,
reverse = True,
key=lambda d:(
d["end_date"] if d["end_date"] is not None else datetime.date(2999,12,31),
d["start_date"])
)