ParseError: Error tokenizing data. C error: Expected 50 fields in line 224599, saw 51
ParseError: Error tokenizing data. C error: Expected 50 fields in line 224599, saw 51
我正在尝试 pd.concat
主 CSV 中的多个 .xlsx 文件,然后将此 CSV 与同样采用 CSV 格式的过去 CPU 数据合并。
第一个操作是成功的(8 个操作中的第 3 个),但是在第二个过程中(历史记录 + CSV 格式的当前数据 - 8 个操作中的第 7 个)我得到了如下所示的 ParseError。
我检查了两个文件,似乎没有分隔符冲突,数据在正确的列中等等。
Error tokenizing data. C error: Expected 50 fields in line 224599, saw
51
我的代码如下:
import pandas as pd
import os
import glob
def sremove(fn):
os.remove(fn) if os.path.exists(fn) else None
def mergeit():
df = pd.concat(pd.read_excel(fl) for fl in path1)
df.to_csv(path2, index = False)
def mergeit2():
df = pd.concat(pd.read_csv(fl) for fl in path1)
df.to_csv(path2, index = False)
print("\n#Operation 3 - Incidents Dataset")
print("Incidents Dataset operation has started")
fn = "S:\CPU CacheU Data\201920\Incidents_201920.csv"
sremove (fn)
print("Incidents 2019/20 file has been deleted - Operation 1 of 8")
path1 = glob.glob('S:\*CPU CacheU Data\*Inc Dataset\Incidents Dataset*.xlsx')
print ("Path 1 - Incidents 2019/20 folder has been read successfully - Operation 2 of 8")
path2 = "S:\CPU CacheU Data\Incidents_201920.csv"
print ("Path 2 - Incidents 2019/20 Dataset File has been read successfully - Operation 3 of 8")
mergeit()
print ("Action has been completed successfully - Incidents Dataset 2019/20 Updated - Operation 4 of 8")
fn = "S:\CPU CacheU Data\Incidents_Dataset.csv"
sremove(fn)
print (" Incidents Dataset Old file has been deleted - Operation 5 of 8")
path1 = glob.glob('S:\*CPU CacheU Data\*Incidents_*.csv')
print ("Path 1 - Incidents folder has been read successfully - Operation 6 of 8")
path2 = "S:\CPU CacheU Data\Incidents_Dataset.csv"
print ("Path 2 - Incidents Dataset File has been read successfully - Operation 7 of 8")
mergeit2()
print ("Path 2 - Incidents Dataset File has been updated successfully - Operation 8 of 8")
一些注意事项:
1) Op 3 out of 8 需要很长时间才能 运行。我不确定这是不是因为 xlsx 到 csv 的转换。
2) 我试图在 def mergeit2()
函数中添加 error_bad_lines = False
语句,但生成主文件似乎需要很长时间。
检查您的 csv 文件中的分隔符,可能单元格中有更多逗号,read_csv 默认使用 sep=','
Propably 你应该设置不同的分隔符来打开你的 csv 文件
pd.read_csv(sep=' ')
我正在尝试 pd.concat
主 CSV 中的多个 .xlsx 文件,然后将此 CSV 与同样采用 CSV 格式的过去 CPU 数据合并。
第一个操作是成功的(8 个操作中的第 3 个),但是在第二个过程中(历史记录 + CSV 格式的当前数据 - 8 个操作中的第 7 个)我得到了如下所示的 ParseError。
我检查了两个文件,似乎没有分隔符冲突,数据在正确的列中等等。
Error tokenizing data. C error: Expected 50 fields in line 224599, saw 51
我的代码如下:
import pandas as pd
import os
import glob
def sremove(fn):
os.remove(fn) if os.path.exists(fn) else None
def mergeit():
df = pd.concat(pd.read_excel(fl) for fl in path1)
df.to_csv(path2, index = False)
def mergeit2():
df = pd.concat(pd.read_csv(fl) for fl in path1)
df.to_csv(path2, index = False)
print("\n#Operation 3 - Incidents Dataset")
print("Incidents Dataset operation has started")
fn = "S:\CPU CacheU Data\201920\Incidents_201920.csv"
sremove (fn)
print("Incidents 2019/20 file has been deleted - Operation 1 of 8")
path1 = glob.glob('S:\*CPU CacheU Data\*Inc Dataset\Incidents Dataset*.xlsx')
print ("Path 1 - Incidents 2019/20 folder has been read successfully - Operation 2 of 8")
path2 = "S:\CPU CacheU Data\Incidents_201920.csv"
print ("Path 2 - Incidents 2019/20 Dataset File has been read successfully - Operation 3 of 8")
mergeit()
print ("Action has been completed successfully - Incidents Dataset 2019/20 Updated - Operation 4 of 8")
fn = "S:\CPU CacheU Data\Incidents_Dataset.csv"
sremove(fn)
print (" Incidents Dataset Old file has been deleted - Operation 5 of 8")
path1 = glob.glob('S:\*CPU CacheU Data\*Incidents_*.csv')
print ("Path 1 - Incidents folder has been read successfully - Operation 6 of 8")
path2 = "S:\CPU CacheU Data\Incidents_Dataset.csv"
print ("Path 2 - Incidents Dataset File has been read successfully - Operation 7 of 8")
mergeit2()
print ("Path 2 - Incidents Dataset File has been updated successfully - Operation 8 of 8")
一些注意事项:
1) Op 3 out of 8 需要很长时间才能 运行。我不确定这是不是因为 xlsx 到 csv 的转换。
2) 我试图在 def mergeit2()
函数中添加 error_bad_lines = False
语句,但生成主文件似乎需要很长时间。
检查您的 csv 文件中的分隔符,可能单元格中有更多逗号,read_csv 默认使用 sep=','
Propably 你应该设置不同的分隔符来打开你的 csv 文件
pd.read_csv(sep=' ')