从字符串中解析日期

parsing dates from strings

我在 python 中有一个这样的字符串列表

['AM_B0_D0.0_2016-04-01T010000.flac.h5',
 'AM_B0_D3.7_2016-04-13T215000.flac.h5',
 'AM_B0_D10.3_2017-03-17T110000.flac.h5',
 'AM_B0_D0.7_2016-10-21T104000.flac.h5',
 'AM_B0_D4.4_2016-08-05T151000.flac.h5',
 'AM_B0_D0.0_2016-04-01T010000.flac.h5',
 'AM_B0_D3.7_2016-04-13T215000.flac.h5',
 'AM_B0_D10.3_2017-03-17T110000.flac.h5',
 'AM_B0_D0.7_2016-10-21T104000.flac.h5',
 'AM_B0_D4.4_2016-08-05T151000.flac.h5']

我只想从这些字符串中解析日期和时间(例如,2016-08-05 15:10:00)。

到目前为止,我使用了如下所示的 for 循环,但它非常耗时,有没有更好的方法来做到这一点?

for files in glob.glob("AM_B0_*.flac.h5"):
    if files[11]=='_':
        year=files[12:16]
        month=files[17:19]
        day= files[20:22]
        hour=files[23:25]
        minute=files[25:27]
        second=files[27:29]
        tindex=pd.date_range(start= '%d-%02d-%02d %02d:%02d:%02d' %(int(year),int(month), int(day), int(hour), int(minute), int(second)), periods=60, freq='10S') 

    else:
        year=files[11:15]
        month=files[16:18]
        day= files[19:21]
        hour=files[22:24]
        minute=files[24:26]
        second=files[26:28]
        tindex=pd.date_range(start= '%d-%02d-%02d %02d:%02d:%02d' %(int(year), int(month), int(day), int(hour), int(minute), int(second)), periods=60, freq='10S')

代替使用硬编码的文件 [11] 去查找 _ 的最后一个或倒数第二个索引,然后使用您的代码,这样您就不必编写 2 次相同的代码。或者使用正则表达式解析字符串。

试试这个(基于倒数第二个“-”,不需要 if-else 大小写):

filesall = ['AM_B0_D0.0_2016-04-01T010000.flac.h5',
 'AM_B0_D3.7_2016-04-13T215000.flac.h5',
 'AM_B0_D10.3_2017-03-17T110000.flac.h5',
 'AM_B0_D0.7_2016-10-21T104000.flac.h5',
 'AM_B0_D4.4_2016-08-05T151000.flac.h5',
 'AM_B0_D0.0_2016-04-01T010000.flac.h5',
 'AM_B0_D3.7_2016-04-13T215000.flac.h5',
 'AM_B0_D10.3_2017-03-17T110000.flac.h5',
 'AM_B0_D0.7_2016-10-21T104000.flac.h5',
 'AM_B0_D4.4_2016-08-05T151000.flac.h5']

def find_second_last(text, pattern):
    return text.rfind(pattern, 0, text.rfind(pattern))

for files in filesall:
    start = find_second_last(files,'-') - 4 # from yyyy- part
    timepart = (files[start:start+17]).replace("T"," ")
    #insert 2 ':'s
    timepart = timepart[:13] + ':' + timepart[13:15] + ':' +timepart[15:]
    # print(timepart)
    tindex=pd.date_range(start= timepart, periods=60, freq='10S')