当指定 index_col 时,Pandas ExcelFile.parse 在索引中有 NaN
Pandas ExcelFile.parse has NaNs in index when index_col is specified
我有一个 excel 文件,我正在将其读入 pandas DataFrame,该文件在第 1 行(python 索引)上有 header 并且中间有一个空行header 和数据。当我指定 index_col 时,它将空白行视为索引的一部分作为 NaN。避免这种行为的最佳方法是什么?
测试文件:
idx value
a 1
不指定 index_col:
print xs.parse(header = 1)
idx value
0 NaN NaN
1 a 1
print xs.parse(header = 1).index
Int64Index([0, 1], dtype='int64')
指定索引列:
print xs.parse(header = 1, index_col = 0)
value
idx
NaN NaN
a 1
print xs.parse(header = 1, index_col = 0).index
Index([nan, u'a'], dtype='object')
你可以通过 skiprows=[1]
来跳过空行,我在虚拟 xl sheet 上测试了这个,见 ExcelFile.parse
:
In [44]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1])
Out[44]:
idx value
0 12 NaN
1 2 NaN
2 1 NaN
比较:
In [45]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse()
Out[45]:
idx value
0 NaN NaN
1 12 NaN
2 2 NaN
3 1 NaN
In [47]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1], header=0)
Out[47]:
idx value
0 12 NaN
1 2 NaN
2 1 NaN
In [49]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1], header=0, index_col=0)
Out[49]:
value
idx
12 NaN
2 NaN
1 NaN
In [50]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(header=0, index_col=0)
Out[50]:
value
idx
NaN NaN
12 NaN
2 NaN
1 NaN
我有一个 excel 文件,我正在将其读入 pandas DataFrame,该文件在第 1 行(python 索引)上有 header 并且中间有一个空行header 和数据。当我指定 index_col 时,它将空白行视为索引的一部分作为 NaN。避免这种行为的最佳方法是什么?
测试文件:
idx value
a 1
不指定 index_col:
print xs.parse(header = 1)
idx value
0 NaN NaN
1 a 1
print xs.parse(header = 1).index
Int64Index([0, 1], dtype='int64')
指定索引列:
print xs.parse(header = 1, index_col = 0)
value
idx
NaN NaN
a 1
print xs.parse(header = 1, index_col = 0).index
Index([nan, u'a'], dtype='object')
你可以通过 skiprows=[1]
来跳过空行,我在虚拟 xl sheet 上测试了这个,见 ExcelFile.parse
:
In [44]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1])
Out[44]:
idx value
0 12 NaN
1 2 NaN
2 1 NaN
比较:
In [45]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse()
Out[45]:
idx value
0 NaN NaN
1 12 NaN
2 2 NaN
3 1 NaN
In [47]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1], header=0)
Out[47]:
idx value
0 12 NaN
1 2 NaN
2 1 NaN
In [49]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1], header=0, index_col=0)
Out[49]:
value
idx
12 NaN
2 NaN
1 NaN
In [50]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(header=0, index_col=0)
Out[50]:
value
idx
NaN NaN
12 NaN
2 NaN
1 NaN