解析复杂的日期列

Parsing complicated date column

我有一个日期列,其中包含 public 民意调查发生的日期。

这些民意调查偶尔 运行 持续几天(通常但不总是连续的),民意调查有时在一个月开始并在下一个月结束,并且年份偶尔输入为 YY,其他时间输入为YYYY.

在有日期范围的情况下,通常使用 - 分隔,但有时使用 - ,范围内的日期之间有时会有空格。

我需要将其清理成具有 start_date 和 end_date 列的一致日期格式。如果民意调查发生在一天,我希望 end_date 列应该是 NA 或填充开始日期(如果你有一个解决方案,我可以随时从那里开始工作,如果需要的话,做相反的事情)。如果有不连续的日期范围,最早日期和最晚日期以及中间停止和重新开始的日期可以被丢弃。

因为格式非常不一致,所以我提供了完整的数据,因为任何解决方案都需要在日期集中的所有日期上工作(或者在其中一些上工作而不破坏其他日期,这样我们就可以解决迭代问题)。

dates <- c("12-15 Feb 2019", "6–11 Feb 2019", "7–10 Feb 2019", "23–30 Jan 2019", 
"24–27 Jan 2019", "9–13 Jan 2019", "13-16 Dec 2018", "13–15 Dec 2018", 
"6–9 Dec 2018", "29 Nov – 2 Dec 2018", "23–25 Nov 2018", "15-18 Nov 2018", 
"15–17 Nov 2018", "8–11 Nov 2018", "1–4 Nov 2018", "25–28 Oct 2018", 
"19–21 Oct 2018", "10–13 Oct 2018", "10–13 Oct 2018", "5–7 Oct 2018", 
"22–24 Sep 2018", "20–23 Sep 2018", "12–15 Sep 2018", "8–10 Sep 2018", 
"6–9 Sep 2018", "25–26 Aug 2018", "24–26 Aug 2018", "24–25 Aug 2018", 
"15-18 Aug 2018", "12-Aug-18", "06-Aug-18", "29-Jul-18", "17-Jul-18", 
"16-Jul-18", "03-Jul-18", "02-Jul-18", "21–24 Jun 2018", "14–17 Jun 2018", 
"14–17 Jun 2018", "02-Jun-18", "31 May – 3 Jun 2018", "24–27 May 2018", 
"17–20 May 2018", "10–13 May 2018", "10–13 May 2018", "10–12 May 2018", 
"3–6 May 2018", "30-Apr-18", "19–22 Apr 2018", "22-Apr-18", "5–8 Apr 2018", 
"5–8 Apr 2018", "3–5 Apr 2018", "24 Mar – 1 Apr 2018", "28-Mar-18", 
"22–25 Mar 2018", "22–25 Mar 2018", "17–25 Mar 2018", "8–11 Mar 2018", 
"3–11 Mar 2018", "1–4 Mar 2018", "22–25 Feb 2018", "24-Feb-18", 
"15–18 Feb 2018", "8–11 Feb 2018", "1–3 Feb 2018", "26–28 Jan 2018", 
"25-Jan-18", "11–15 Jan 2018", "19-Dec-17", "14–17 Dec 2017", 
"12-Dec-17", "7–10 Dec 2017", "05-Dec-17", "30 Nov ? 3 Dec 2017", 
"29-Nov-17", "28-Nov-17", "23–27 Nov 2017", "21-Nov-17", "14-Nov-17", 
"14-Nov-17", "13-Nov-17", "30-Oct-17", "26–29 Oct 2017", "24-Oct-17", 
"12–15 Oct 2017", "04-Oct-17", "01-Oct-17", "26-Sep-17", "21–24 Sep 2017", 
"19-Sep-17", "14–18 Sep 2017", "12-Sep-17", "6–9 Sep 2017", "05-Sep-17", 
"31 Aug – 4 Sep 2017", "28 Aug – 2 Sep 2017", "29-Aug-17", "23-Aug-17", 
"22-Aug-17", "17–21 Aug 2017", "17–20 Aug 2017", "15-Aug-17", 
"08-Aug-17", "3–6 Aug 2017", "01-Aug-17", "25-Jul-17", "20–24 Jul 2017", 
"20–23 Jul 2017", "19-Jul-17", "18-Jul-17", "6–11 Jul 2017", 
"6–9 Jul 2017", "29-Jun-17", "22–27 Jun 2017", "15–18 Jun 2017", 
"14-Jun-17", "26–29 May 2017", "23-May-17", "12–15 May 2017", 
"11-May-17", "10–11 May 2017", "26–30 Apr 2017", "20–23 Apr 2017", 
"13–16 Apr 2017", "6–9 Apr 2017", "1–4 Apr 2017", "30 Mar – 2 Apr 2017", 
"24–27 Mar 2017", "22–25 Mar 2017", "17–20 Mar 2017", "16–19 Mar 2017", 
"10–13 Mar 2017", "3–6 Mar 2017", "23–26 Feb 2017", "16–19 Feb 2017", 
"9–12 Feb 2017", "2–5 Feb 2017", "20–23 Jan 2017", "13–16 Jan 2017", 
"12-Jan-17", "9–12 Dec 2016", "1–4 Dec 2016", "25–28 Nov 2016", 
"24–26 Nov 2016", "17–20 Nov 2016", "11–14 Nov 2016", "3–6 Nov 2016", 
"20–23 Oct 2016", "14–17 Oct 2016", "7–10 Oct 2016", "6–9 Oct 2016", 
"22–25 Sep 2016", "9–12 Sep 2016", "8–11 Sep 2016", "26–29 Aug 2016", 
"25–28 Aug 2016", "19–22 Aug 2016", "12–15 Aug 2016", "5–8 Aug 2016", 
"27 Jul – 1 Aug 2016", "20–24 Jul 2016", "13–17 Jul 2016", "6–10 Jul 2016", 
"30 Jun – 3 Jul 2016", "28 Jun – 1 Jul 2016", "30-Jun-16", "27–30 Jun 2016", 
"28–29 Jun 2016", "26–29 Jun 2016", "28 Jun – 1 Jul 2016", "30-Jun-16", 
"27–30 Jun 2016", "28–29 Jun 2016", "26–29 Jun 2016", "23–26 Jun 2016", 
"23–26 Jun 2016", "23-Jun-16", "20–22 Jun 2016", "16–19 Jun 2016", 
"16–19 Jun 2016", "16-Jun-16", "14–16 Jun 2016", "9–12 Jun 2016", 
"09-Jun-16", "2–5 Jun 2016", "2–5 Jun 2016", "02-Jun-16", "31 May – 2 Jun 2016", 
"26–29 May 2016", "21–22,\n                      28–29 May 2016", 
"26-May-16", "19–22 May 2016", "19–22 May 2016", "19-May-16", 
"17–19 May 2016", "14–15 May 2016", "12–15 May 2016", "6–8 May 2016", 
"5–8 May 2016", "5–8 May 2016", "5–7 May 2016", "4–6 May 2016", 
"05-May-16", "27 Apr – 1 May 2016", "23–24, 30 Apr – 1 May 2016", 
"20–24 Apr 2016", "14–17 Apr 2016", "13–17 Apr 2016", "9–10,\n                      16–17 Apr 2016", 
"14–16 Apr 2016", "14-Apr-16", "6–10 Apr 2016", "31 Mar – 3 Apr 2016", 
"26–27 Mar, 2–3 Apr 2016", "21-Mar-16", "17–20 Mar 2016", "16–20 Mar 2016", 
"12–13,\n                      19–20 Mar 2016", "10–12 Mar 2016", 
"3–6 Mar 2016", "2–6 Mar 2016", "27–28 Feb, 5–6 Mar 2016", "24–28 Feb 2016", 
"18–21 Feb 2016", "17–21 Feb 2016", "13–14, 20–21 Feb 2016", 
"11–13 Feb 2016", "11-Feb-16", "3–7 Feb 2016", "30–31 Jan,\n                      6–7 Feb 2016", 
"28–31 Jan 2016", "16–17, 23–24 Jan 2016", "21-Jan-16", "15–18 Jan 2016", 
"2–3, 9–10 Jan 2016", "15-Dec-15", "5–6, 12–13 Dec 2015", "08-Dec-15", 
"4–6 Dec 2015", "01-Dec-15", "21–22, 28–29 Nov 2015", "26-Nov-15", 
"24-Nov-15", "19–22 Nov 2015", "7–8, 14–15 Nov 2015", "12–14 Nov 2015", 
"10-Nov-15", "6–8 Nov 2015", "03-Nov-15", "24–25 Oct,\n                      1 Nov 2015", 
"27-Oct-15", "23–25 Oct 2015", "22-Oct-15", "20-Oct-15", "10–11, 17–18 Oct 2015", 
"15–17 Oct 2015", "13-Oct-15", "9–11 Oct 2015", "26–27 Sep, 1–5 Oct 2015", 
"1–4 Oct 2015", "24–28 Sep 2015", "17–21 Sep 2015", "19–20 Sep 2015", 
"17–20 Sep 2015", "15–16 Sep 2015", "15-Sep-15", "12–13 Sep 2015", 
"5–6 Sep 2015", "4–6 Sep 2015", "26–30 Aug 2015", "27-Aug-15", 
"22–23 Aug 2015", "20–23 Aug 2015", "13–15 Aug 2015", "11–14 Aug 2015", 
"8–9 Aug 2015", "8–9 Aug 2015", "4–7 Aug 2015", "06-Aug-15", 
"28–31 Jul 2015", "30-Jul-15", "25–26 Jul 2015", "16–19 Jul 2015", 
"14–17 Jul 2015", "11–12 Jul 2015", "4–5 Jul 2015", "2–4 Jul 2015", 
"27–28 Jun 2015", "16-Jun-15", "16-Jun-15", "13–14 Jun 2015", 
"11–13 Jun 2015", "11–13 Jun 2015", "02-Jun-15", "02-Jun-15", 
"23–24, 30–31 May 2015", "26-May-15", "18-May-15", "17-May-15", 
"17-May-15", "13-May-15", "7–10 May 2015", "04-May-15", "04-May-15", 
"28-Apr-15", "21-Apr-15", "11–12,\n                      18–19 Apr 2015", 
"14-Apr-15", "10–12 Apr 2015", "9–11 Apr 2015", "28–29 Mar, 3–6 Apr 2015", 
"29-Mar-15", "20–22 Mar 2015", "14–15, 21–22 Mar 2015", "17-Mar-15", 
"10-Mar-15", "7–8 Mar 2015", "28 Feb–1, 7–8 Mar 2015", "26–28 Feb 2015", 
"20–22 Feb 2015", "20–22 Feb 2015", "6–8 Feb 2015", "31 Jan–1, 7–8 Feb 2015", 
"05-Feb-15", "4–5 Feb 2015", "28–30 Jan 2015", "27-Jan-15", "r27 Jan 2015", 
"20-Jan-15", "13-Jan-15", "12-Jan-15", "23–27 Dec 2014", "16-Dec-14", 
"12–15 Dec 2014", "6–7, 13–14 Dec 2014", "4–6 Dec 2014", "2–4 Dec 2014", 
"02-Dec-14", "29–30 Nov 2014", "22–23, 29–30 Nov 2014", "25-Nov-14", 
"21-Nov-14", "18-Nov-14", "17-Nov-14", "17-Nov-14", "11-Nov-14", 
"04-Nov-14", "04-Nov-14", "25–26 Oct,\n                      1–2 Nov 2014", 
"30 Oct–1 Nov 2014", "28-Oct-14", "23-Oct-14", "21-Oct-14", "21-Oct-14", 
"20-Oct-14", "14-Oct-14", "07-Oct-14", "4–5 Oct 2014", "4–5 Oct 2014", 
"23-Sep-14", "13–14,\n                      20–21 Sep 2014", 
"18-Sep-14", "30–31 Aug, 6–7 Sep 2014", "5–7 Sep 2014", "22–24 Aug 2014", 
"16–17, 23–24 Aug 2014", "19-Aug-14", "9–10 Aug 2014", "8–10 Aug 2014", 
"25–27 Jul 2014", "11–13 Jul 2014", "01-Jul-14", "30-Jun-14", 
"27–29 Jun 2014", "13–15 Jun 2014", "30 May–1 Jun 2014", "27-May-14", 
"20-May-14", "17–18 May 2014", "16–18 May 2014", "15–17 May 2014", 
"04-May-14", "2–4 May 2014", "30-Apr-14", "22-Apr-14", "15-Apr-14", 
"13-Apr-14", "08-Apr-14", "07-Apr-14", "4–6 Apr 2014", "25-Mar-14", 
"25-Mar-14", "21–23 Mar 2014", "18-Mar-14", "13–15 Mar 2014", 
"7–9 Mar 2014", "05-Mar-14", "23-Feb-14", "21–23 Feb 2014", "15-Feb-14", 
"7–9 Feb 2014", "28-Jan-14", "23-Jan-14", "17–20 Jan 2014", "13-Jan-14", 
"16-Dec-13", "15-Dec-13", "6–8 Dec 2013", "28 Nov–2 Dec 2013", 
"30 Nov–1 Dec 2013", "22–24 Nov 2013", "21–23 Nov 2013", "8–10 Nov 2013", 
"25–27 Oct 2013", "19–20 Oct 2013", "21–22 Sep 2013", "19–22 Sep 2013", 
"12–15 Sep 2013", "4–6 Sep 2013", "05-Sep-13", "3–5 Sep 2013", 
"4–6 Sep 2013", "05-Sep-13", "4–5 Sep 2013", "3–5 Sep 2013", 
"04-Sep-13", "2–4 Sep 2013", "1–4 Sep 2013", "03-Sep-13", "30 Aug–1 Sep 2013", 
"30 Aug–1 Sep 2013", "29 Aug–1 Sep 2013", "28–29 Aug 2013", "28–29 Aug 2013", 
"26-Aug-13", "21–25 Aug 2013", "23–25 Aug 2013", "23–25 Aug 2013", 
"18–22 Aug 2013", "16–18 Aug 2013", "16–18 Aug 2013", "16–18 Aug 2013", 
"14–18 Aug 2013", "14–15 Aug 2013", "12–13 Aug 2013", "9–12 Aug 2013", 
"9–11 Aug 2013", "9–11 Aug 2013", "10-Aug-13", "7–9 Aug 2013", 
"6–8 Aug 2013", "04-Aug-13", "2–4 Aug 2013", "2–4 Aug 2013", 
"1–4 Aug 2013", "26–28 Jul 2013", "25–28 Jul 2013", "23–25 Jul 2013", 
"18–22 Jul 2013", "19–21 Jul 2013", "19–21 Jul 2013", "18-Jul-13", 
"12–14 Jul 2013", "11–14 Jul 2013", "11–13 Jul 2013", "5–8 Jul 2013", 
"5–7 Jul 2013", "5–7 Jul 2013", "4–7 Jul 2013", "28–30 Jun 2013", 
"28–30 Jun 2013", "27–30 Jun 2013", "27–28 Jun 2013", "27-Jun-13", 
"21–23 Jun 2013", "21–23 Jun 2013", "20–23 Jun 2013", "14–16 Jun 2013", 
"13–16 Jun 2013", "13–15 Jun 2013", "11–13 Jun 2013", "7–10 Jun 2013", 
"6–10 Jun 2013", "31 May–2 Jun 2013", "31 May–2 Jun 2013", "30 May–2 Jun 2013", 
"24–26 May 2013", "23–26 May 2013", "17–19 May 2013", "17–19 May 2013", 
"16–19 May 2013", "16–18 May 2013", "15–16 May 2013", "10–12 May 2013", 
"9–12 May 2013", "3–5 May 2013", "3–5 May 2013", "2–5 May 2013", 
"02-May-13", "26–28 Apr 2013", "25–28 Apr 2013", "18–22 Apr 2013", 
"18–22 Apr 2013", "19–21 Apr 2013", "11–14 Apr 2013", "11–14 Apr 2013", 
"11–13 Apr 2013", "9–11 Apr 2013", "02-May-13", "5–7 Apr 2013", 
"4–7 Apr 2013", "4–7 Apr 2013", "29 Mar–1 Apr 2013", "28 Mar–1 Apr 2013", 
"22–24 Mar 2013", "21–24 Mar 2013", "22–23 Mar 2013", "21–24 Mar 2013", 
"22–25 Mar 2013", "14–17 Mar 2013", "14–17 Mar 2013", "14–16 Mar 2013", 
"7–10 Mar 2013", "7–10 Mar 2013", "8–10 Mar 2013", "5–7 Mar 2013", 
"28 Feb–3 Mar 2013", "28 Feb–3 Mar 2013", "21–24 Feb 2013", "16–17/23–24 Feb 2013", 
"22–24 Feb 2013", "14–17 Feb 2013", "14–16 Feb 2013", "7–10 Feb 2013", 
"9–10 Feb 2013", "1–4 Feb 2013", "2–3 Feb 2013", "1–3 Feb 2013", 
"1–3 Feb 2013", "23–28 Jan 2013", "19–20/26–27 Jan 2013", "16–20 Jan 2013", 
"9–13 Jan 2013", "11–13 Jan 2013", "5–6/12–13 Jan 2013", "12–16 Dec 2012", 
"8–9/15–16 Dec 2012", "13–15 Dec 2012", "5–9 Dec 2012", "7–9 Dec 2012", 
"28 Nov–2 Dec 2012", "24–25 Nov/1–2 Dec 2012", "29–30 Nov 2012", 
"27–29 Nov 2012", "23–25 Nov 2012", "21–25 Nov 2012", "14–18 Nov 2012", 
"10–11/17–18 Nov 2012", "15–17 Nov 2012", "9–11 Nov 2012", "7–11 Nov 2012", 
"2–6 Nov 2012", "2–4 Nov 2012", "27–28 Oct/3–4 Nov 2012", "26–28 Oct 2012", 
"25–28 Oct 2012", "13–14/20–21 Oct 2012", "17–21 Oct 2012", "18–20 Oct 2012", 
"10–14 Oct 2012", "5–7 Oct 2012", "3–7 Oct 2012", "29–30 Sep/6–7 Oct 2012", 
"26–30 Sep 2012", "22–23 Sep 2012", "19–23 Sep 2012", "17–20 Sep 2012", 
"14–16 Sep 2012", "12–16 Sep 2012", "8–9/15–16 Sep 2012", "13–15 Sep 2012", 
"29 Aug–2 Sep 2012", "31 Aug–2 Sep 2012", "1–2 Sep 2012", "22–26 Aug 2012", 
"23–25 Aug 2012", "15–19 Aug 2012", "17–19 Aug 2012", "11–12/18–19 Aug 2012", 
"8–12 Aug 2012", "3–5 Aug 2012", "1–5 Aug 2012", "28–29 Jul/4–5 Aug 2012", 
"25–29 Jul 2012", "26–28 Jul 2012", "20–22 Jul 2012", "18–22 Jul 2012", 
"14–15/21–22 Jul 2012", "11–15 Jul 2012", "6–8 Jul 2012", "4–8 Jul 2012", 
"30 Jun–1/7–8 Jul 2012", "27 Jun–1 Jul 2012", "22–24 Jun 2012", 
"20–24 Jun 2012", "16–17/23–24 Jun 2012", "13–17 Jun 2012", "15–17 Jun 2012", 
"6–11 Jun 2012", "9–10 Jun 2012", "7–10 Jun 2012", "2–3 Jun 2012", 
"31 May–2 Jun 2012", "30 May–3 Jun 2012", "26–27 May 2012", "23–27 May 2012", 
"25–27 May 2012", "16–20 May 2012", "19–20 May 2012", "12–13 May 2012", 
"11–13 May 2012", "9–13 May 2012", "9–10 May 2012", "9–10 May 2012", 
"5–6 May 2012", "2–6 May 2012", "27–29 Apr 2012", "27–29 Apr 2012", 
"25–29 Apr 2012", "21–22 Apr 2012", "18–22 Apr 2012", "17–19 Apr 2012", 
"13–15 Apr 2012", "11–15 Apr 2012", "7–8/14–15 Apr 2012", "4–9 Apr 2012", 
"31 Mar–1 Apr 2012", "28 Mar–1 Apr 2012", "29–31 Mar 2012", "21–25 Mar 2012", 
"24–25 Mar 2012", "23–25 Mar 2012", "14–18 Mar 2012", "10–11/17–18 Mar 2012", 
"9–11 Mar 2012", "7–11 Mar 2012", "3–4 Mar 2012", "29 Feb–4 Mar 2012", 
"25–26 Feb 2012", "23–26 Feb 2012", "22–26 Feb 2012", "23–24 Feb 2012", 
"22–23 Feb 2012", "15–19 Feb 2012", "11–12/18–19 Feb 2012", "10–12 Feb 2012", 
"8–10 Feb 2012", "7–8 Feb 2012", "4–5 Feb 2012", "1–5 Feb 2012", 
"2–4 Feb 2012", "28–29 Jan 2012", "27–29 Jan 2012", "25–29 Jan 2012", 
"27–28 Jan 2012", "18–22 Jan 2012", "14–15/21–22 Jan 2012", "17–18 Jan 2012", 
"11–15 Jan 2012", "7–8 Jan 2012", "14–18 Dec 2011", "10–11/17–18 Dec 2011", 
"7–11 Dec 2011", "8–10 Dec 2011", "2–4 Dec 2011", "30 Nov–4 Dec 2011", 
"26–27 Nov/3–4 Dec 2011", "23–27 Nov 2011", "19–20 Nov 2011", 
"18–20 Nov 2011", "16–20 Nov 2011", "9–13 Nov 2011", "5–6/12–13 Nov 2011", 
"10–12 Nov 2011", "3–6 Nov 2011", "2–6 Nov 2011", "2–3 Nov 2011", 
"26–30 Oct 2011", "29–30 Oct 2011", "25–26 Oct 2011", "22–23 Oct 2011", 
"21–23 Oct 2011", "19–23 Oct 2011", "15–16Oct 2011", "14–16 Oct 2011", 
"12–16 Oct 2011", "13–15 Oct 2011", "8–9 Oct 2011", "7–9 Oct 2011", 
"4–9 Oct 2011", "27 Sep–2 Oct 2011", "24–25 Sep/1–2 Oct 2011", 
"20–25 Sep 2011", "16–18 Sep 2011", "13–18 Sep 2011", "10–11/17–18 Sep 2011", 
"7–11 Sep 2011", "8–10 Sep 2011", "2–4 Sep 2011", "31 Aug–4 Sep 2011", 
"27–28 Aug/3–4 Sep 2011", "24–28 Aug 2011", "19–21 Aug 2011", 
"17–21 Aug 2011", "13–14/20–21 Aug 2011", "10–14 Aug 2011", "11–13 Aug 2011", 
"9–10 Aug 2011", "5–7 Aug 2011", "3–7 Aug 2011", "30–31 Jul/6–7 Aug 2011", 
"c. 3 Aug 2011", "27–31 Jul 2011", "22–24 Jul 2011", "20–24 Jul 2011", 
"16–17/23–24 Jul 2011", "13–17 Jul 2011", "14–16 Jul 2011", "13–14 Jul 2011", 
"9–10 Jul 2011", "8–10 Jul 2011", "6–10 Jul 2011", "29 Jun–3 Jul 2011", 
"25–26 Jun/1–2 Jul 2011", "24–26 Jun 2011", "22–26 Jun 2011", 
"11–12/18–19 Jun 2011", "15–19 Jun 2011", "14–16 Jun 2011", "8–13 Jun 2011", 
"10–12 Jun 2011", "4–5 Jun 2011", "1–5 Jun 2011", "31 May–2 Jun 2011", 
"25–29 May 2011", "27–29 May 2011", "21–22/28–29 May 2011", "18–22 May 2011", 
"14–15 May 2011", "13–15 May 2011", "11–15 May 2011", "12–14 May 2011", 
"7–8 May 2011", "4–8 May 2011", "3–4 May 2011", "29 Apr–1 May 2011", 
"28 Apr–1 May 2011", "23–24/30 Apr–1 May 2011", "20–26 Apr 2011", 
"13–17 Apr 2011", "9–10/16–17 Apr 2011", "14–16 Apr 2011", "6–10 Apr 2011", 
"2–3 Apr 2011", "1–3 Apr 2011", "30 Mar–3 Apr 2011", "26–27 Mar 2011", 
"23–27 Mar 2011", "22–24 Mar 2011", "19–20 Mar 2011", "18–20 Mar 2011", 
"16–20 Mar 2011", "16–17 Mar 2011", "12–13 Mar 2011", "9–13 Mar 2011", 
"10–12 Mar 2011", "8–10 Mar 2011", "5–6 Mar 2011", "4–6 Mar 2011", 
"2–6 Mar 2011", "26–27 Feb 2011", "22–27 Feb 2011", "21–23 Feb 2011", 
"18–20 Feb 2011", "15–20 Feb 2011", "12–13/19–20 Feb 2011", "8–13 Feb 2011", 
"10–12 Feb 2011", "4–6 Feb 2011", "1–6 Feb 2011", "29–30 Jan/5–6 Feb 2011", 
"1–3 Feb 2011", "25–30 Jan 2011", "18–23 Jan 2011", "15–16/22–23 Jan 2011", 
"11–16 Jan 2011", "8–9 Jan 2011", "14–19 Dec 2010", "11–12 Dec 2010", 
"8–12 Dec 2010", "7–12 Dec 2010", "4–5 Dec 2010", "3–5 Dec 2010", 
"30 Nov–5 Dec 2010", "23–28 Nov 2010", "20–21/27–28 Nov 2010", 
"19–21 Nov 2010", "16–21 Nov 2010", "18–20 Nov 2010", "9–14 Nov 2010", 
"6–7/13–14 Nov 2010", "5–7 Nov 2010", "2–7 Nov 2010", "26–31 Oct 2010", 
"23–24/30–31 Oct 2010", "22–24 Oct 2010", "19–24 Oct 2010", "21–23 Oct 2010", 
"12–17 Oct 2010", "9–10/16–17 Oct 2010", "8–10 Oct 2010", "5–10 Oct 2010", 
"2–3 Oct 2010", "30 Sep–1 Oct 2010", "21–26 Sep 2010", "18–19 Sep 2010", 
"14–19 Sep 2010", "15–16 Sep 2010", "10–12 Sep 2010", "7–12 Sep 2010", 
"31 Aug–5 Sep 2010", "28–29 Aug/4–5 Sep 2010", "24–29 Aug 2010", 
"25–26 Aug 2010", "c. 21 Aug 2010", "17–19 Aug 2010", "13–19 Aug 2010"
)

数据集中需要注意的一些特别奇怪的日期 "30 Nov ? 3 Dec 2017"
"21–22,\n 28–29 May 2016" "24–25 Oct,\n 1 Nov 2015" "29–30 Sep/6–7 Oct 2012"

我在处理来自世界各地的文档时遇到了同样的问题。最佳答案是在创建输入公式时强制上游日期格式

这个答案不会解决您的 确切 日期范围问题,但您可以调整我在这里提出的解决方案来处理它。我使用了所谓的 正则表达式模式 。我将我在 python.

中的类似解决方案中使用的模式粘贴到此处
# 2019/02/20 or 2019-02-20
(?:|[\s\/\.:])+(\d{4})[\/\-\.\s](\d{2})[\/\-\.\s](\d{2})(?:$|[\s\/\.\-])+
# 02/20/2019 or 20/02/2019
(?:|[\/\s\.:])+(\d{2})[\/\-\.\s](\d{2})[\/\-\.\s](\d{4})(?:$|[\/\s\.\-])+
# 20 Feb 2019 or 20-Feb-2019
(?:^|[\s\.:])+(\d{2})[\/\-\.\s]?([a-zA-Z]{2,3})[\/\-\.\s]?(\d{4})(?:$|[\s\.\-])+
# 2019 Feb 20
(?:^|[\s\.:])+(\d{4})[\/\-\.\s]?([a-zA-Z]{2,3})[\/\-\.\s]?(\d{2})(?:$|[\s\.\-])+
# February 20th, 2019
(?:^|[\s\.:])+([a-zA-Z]{3,15})\s(\d{1,2})\s?[a-zA-Z]{2},\s?(\d{2,4})(?:$|[\s\.\-])+
# Feb 20 2019 or February 20 2019
(?:^|[\s\.:])*([a-zA-Z]{3,15})[ _\-\/\\.]?(\d{1,2})[ _\-\/\\.](\d{2,4})(?:$|[\s\.\-])+
#20-FEB-2019
(?:^|[\s\.\-:])+(\d{1,2})[ _\-\/\\.]([a-zA-Z]{3,15})[ _\-\/\\.](\d{2,4})(?:$|[\s\.\-])+
#2019.Feb.20
(?:^|[\s\.\-:])+(\d{4})[ _\-\/\\.]([a-zA-Z]{3,15})[ _\-\/\\.](\d{1,2})(?:$|[\s\.\-])+
# 20 Feb. 2019
(?:^|[\s\.\-:])+(\d{2})[\/\-\.\s]([a-zA-Z]{3,15})[\/\-\.\s]{1,2}(\d{4})(?:$|[\s\.\-])+

这些正则表达式模式包含三组,您可以提取并手动测试它们以防您解析不明确的日期(例如:02/01/2019)。

在实施解决方案之前,我建议您按照以下步骤进行操作:

  • 列出所有要解析的日期,每行一个
  • 将它们全部粘贴到正则表达式测试器中(例如:online regex tester
  • 尝试(并最终修改)以下每个模式
  • 确认你正确地抓住了第 1 到 3 组
  • 检查正确捕获的日期,直到列表中没有人留下,并且没有冲突发生
  • 在R中实现一个日期解析函数
  • 暗自希望有一个日期的国际标准

这是一个有趣的问题!不过我觉得可以用regex解决。

这个怎么样:

library(tidyverse)
tibble(dates = dates) %>%
  mutate(end_year = str_extract(dates, "[0-9]*$"),
         end_year = ifelse(str_length(end_year) == 2, paste0("20", end_year), end_year),
         month_one = str_extract(dates, "[A-Z][a-z][a-z]"),
         month_two = str_sub(str_extract(dates, "[A-Z][a-z][a-z].*[A-Z][a-z][a-z]"), start = -3),
         month_two = if_else(is.na(month_two), month_one, month_two),
         day_one = str_extract(dates, "[0-9]+"),
         dates_without_day_one = gsub("^[0-9]+", "", dates),
         day_two = str_extract(dates_without_day_one, "[0-9]+"),
         day_two = str_squish(gsub("[-–]", "", day_two)),
         day_three_four = str_extract(dates, "/.+[-–] *[0-9]+"),
         day_three = str_extract(day_three_four, "/ *[0-9]+"),
         day_three = str_squish(gsub("/", "", day_three)),
         day_four = str_extract(day_three_four, "[-–] *[0-9]+"),
         day_four = str_squish(gsub("[-–]", "", day_four))
  ) %>%
  # dates that are only a single day:
  mutate(day_two = if_else(is.na(day_two), day_one, day_two)) %>%
  # dates that actually have four days:
  mutate(day_one = ifelse(is.na(day_three),
                           day_one,
                           round((as.numeric(day_one) + as.numeric(day_two)) / 2)),
         day_two = ifelse(is.na(day_three),
                           day_two,
                           round((as.numeric(day_three) + as.numeric(day_four)) / 2))) %>%
  select(-day_three_four, -dates_without_day_one) %>%

  mutate(start_date = as.Date(paste(end_year,month_one, day_one, sep = "-"), format = "%Y-%b-%d"),
         end_date = as.Date(paste(end_year,month_two, day_two, sep = "-"), format = "%Y-%b-%d")) %>%
  select(dates, start_date, end_date, everything()) 

交付:

# A tibble: 838 x 10
   dates               start_date end_date   end_year month_one month_two day_one day_two day_three day_four
   <chr>               <date>     <date>     <chr>    <chr>     <chr>     <chr>   <chr>   <chr>     <chr>   
 1 12-15 Feb 2019      2019-02-12 2019-02-15 2019     Feb       Feb       12      15      NA        NA      
 2 6–11 Feb 2019       2019-02-06 2019-02-11 2019     Feb       Feb       6       11      NA        NA      
 3 7–10 Feb 2019       2019-02-07 2019-02-10 2019     Feb       Feb       7       10      NA        NA      
 4 23–30 Jan 2019      2019-01-23 2019-01-30 2019     Jan       Jan       23      30      NA        NA      
 5 24–27 Jan 2019      2019-01-24 2019-01-27 2019     Jan       Jan       24      27      NA        NA      
 6 9–13 Jan 2019       2019-01-09 2019-01-13 2019     Jan       Jan       9       13      NA        NA      
 7 13-16 Dec 2018      2018-12-13 2018-12-16 2018     Dec       Dec       13      16      NA        NA      
 8 13–15 Dec 2018      2018-12-13 2018-12-15 2018     Dec       Dec       13      15      NA        NA      
 9 6–9 Dec 2018        2018-12-06 2018-12-09 2018     Dec       Dec       6       9       NA        NA      
10 29 Nov – 2 Dec 2018 2018-11-29 2018-12-02 2018     Nov       Dec       29      2       NA        NA