使用 Python 打开 CSV 列表并将列 unix 时间修改为日期时间 - 它出错

Using Python open a list of CSV's and modify columns unix time to datetime - it errors

我有一长串 CSV,我不想将其附加到其中,我希望保持 CSV 的名称不变,仅将 2 列从当前具有 13 位 unix 日期的列更改为自然日期时间 I.E YYYY/MM/DD HH:MM:SS.

我很高兴使用 Pandas 这似乎是更简单的方法,但我正在努力解决这个问题,我希望这样的方法可能有用。感谢您的帮助!

这是 unix 时间的示例:1640227953000 这将转换为 2021 年 12 月 23 日星期四 02:52:33

import pandas as pd
import datetime
from pathlib import Path # available in python 3.4 + 


dir = r'csv/' # raw string for windows.
csv_files = [f for f in Path(dir).glob('*.csv')] # finds all csvs in your folder.

print(csv_files)



for csv in csv_files: #iterate list
    df = pd.read_csv(csv) #read cs
    print(df.columns.tolist()) # used for trouble shooting
    df['values_authorTimestamp']=df['values_authorTimestamp'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
    df['values_committerTimestamp']=df['values_committerTimestamp'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
    #df['values_authorTimestamp'] = pd.to_datetime(df['values_authorTimestamp'], format='%Y-%m-%d %H:%M:%S')
    # print(df)
    print(f'{csv.name} saved.')
    df.to_csv(f'csv/{csv.name}')

#values_committerTimestamp
    


这是有效的,正在保存到 CSV,但是它只通过了其中的一些并抛出了错误,有什么想法吗?

  File "Scripts/Audit/change-csv.py", line 16, in <module>
    df['values_authorTimestamp']=df['values_authorTimestamp'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
  File "/opt/homebrew/lib/python3.9/site-packages/pandas/core/series.py", line 4433, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
  File "/opt/homebrew/lib/python3.9/site-packages/pandas/core/apply.py", line 1082, in apply
    return self.apply_standard()
  File "/opt/homebrew/lib/python3.9/site-packages/pandas/core/apply.py", line 1137, in apply_standard
    mapped = lib.map_infer(
  File "pandas/_libs/lib.pyx", line 2870, in pandas._libs.lib.map_infer
  File "Scripts/Audit/change-csv.py", line 16, in <lambda>
    df['values_authorTimestamp']=df['values_authorTimestamp'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
ValueError: invalid literal for int() with base 10: '2021-11-04 17:19:24'

似乎有日期时间的混合格式,如果不是数学格式,请尝试使用 errors='coerce' 参数作为缺失值,然后将缺失值替换为另一个 Series by Series.fillna:

df = pd.DataFrame({'values_authorTimestamp':[1640227953000,'2021-11-04 17:19:24']})

d1 = pd.to_datetime(df['values_authorTimestamp'], unit='ms', errors='coerce')
d2 = pd.to_datetime(df['values_authorTimestamp'], errors='coerce')

df['values_authorTimestamp'] = d1.fillna(d2).dt.strftime('%Y/%m/%d %H:%M:%S')
print (df)
  values_authorTimestamp
0    2021/12/23 02:52:33
1    2021/11/04 17:19:24

所以你的解决方案改变了:

for csv in csv_files: #iterate list
    df = pd.read_csv(csv) #read cs
    d1 = pd.to_datetime(df['values_authorTimestamp'], unit='ms', errors='coerce')
    d2 = pd.to_datetime(df['values_authorTimestamp'], errors='coerce')

    df['values_authorTimestamp'] = d1.fillna(d2).dt.strftime('%Y/%m/%d %H:%M:%S')
    
    d11 = pd.to_datetime(df['values_committerTimestamp'], unit='ms', errors='coerce')
    d21 = pd.to_datetime(df['values_committerTimestamp'], errors='coerce')

    df['values_committerTimestamp'] = d11.fillna(d21).dt.strftime('%Y/%m/%d %H:%M:%S')


    # print(df)
    print(f'{csv.name} saved.')
    df.to_csv(f'csv/{csv.name}')