防止 pandas read_csv 将第一行视为列名的 header
Prevent pandas read_csv treating first row as header of column names
我正在使用 pd.read_csv
在 pandas DataFrame
中阅读。我想将第一行保留为数据,但它不断转换为列名。
- 我试过
header=False
但这完全删除了它。
(注意我的输入数据:我有一个字符串 (st = '\n'.join(lst)
),我将其转换为 file-like object (io.StringIO(st)
),然后构建 csv
来自那个文件 object.)
你想要 header=None
False
类型提升到 int
到 0
看到 docs 强调我的:
header : int or list of ints, default ‘infer’ Row number(s) to use as
the column names, and the start of the data. Default behavior is as if
set to 0 if no names passed, otherwise None. Explicitly pass header=0
to be able to replace existing names. The header can be a list of
integers that specify row locations for a multi-index on the columns
e.g. [0,1,3]. Intervening rows that are not specified will be skipped
(e.g. 2 in this example is skipped). Note that this parameter ignores
commented lines and empty lines if skip_blank_lines=True, so header=0
denotes the first line of data rather than the first line of the file.
您可以看到行为上的差异,首先是 header=0
:
In [95]:
import io
import pandas as pd
t="""a,b,c
0,1,2
3,4,5"""
pd.read_csv(io.StringIO(t), header=0)
Out[95]:
a b c
0 0 1 2
1 3 4 5
现在 None
:
In [96]:
pd.read_csv(io.StringIO(t), header=None)
Out[96]:
0 1 2
0 a b c
1 0 1 2
2 3 4 5
请注意,在最新版本 0.19.1
中,这将引发 TypeError
:
In [98]:
pd.read_csv(io.StringIO(t), header=False)
TypeError: Passing a bool to header is invalid. Use header=None for no
header or header=int or list-like of ints to specify the row(s) making
up the column names
我想你需要参数 header=None
到 read_csv
:
样本:
import pandas as pd
from pandas.compat import StringIO
temp=u"""a,b
2,1
1,1"""
df = pd.read_csv(StringIO(temp),header=None)
print (df)
0 1
0 a b
1 2 1
2 1 1
如果您使用 pd.ExcelFile
阅读所有 excel 文件表,则:
df = pd.ExcelFile("path_to_file.xlsx")
df.sheet_names # Provide the sheet names in the excel file
df = df.parse(2, header=None) # Parsing the 2nd sheet in the file with header = None
df
输出:
0 1
0 a b
1 1 1
2 0 1
3 5 2
您可以设置自定义列名以防止出现这种情况:
假设您的数据集中有两列,那么:
df = pd.read_csv(your_file_path, names = ['first column', 'second column'])
如果您有多个列,您还可以通过编程方式生成列名,并且可以在名称属性前面传递一个列表。
我正在使用 pd.read_csv
在 pandas DataFrame
中阅读。我想将第一行保留为数据,但它不断转换为列名。
- 我试过
header=False
但这完全删除了它。
(注意我的输入数据:我有一个字符串 (st = '\n'.join(lst)
),我将其转换为 file-like object (io.StringIO(st)
),然后构建 csv
来自那个文件 object.)
你想要 header=None
False
类型提升到 int
到 0
看到 docs 强调我的:
header : int or list of ints, default ‘infer’ Row number(s) to use as the column names, and the start of the data. Default behavior is as if set to 0 if no names passed, otherwise None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
您可以看到行为上的差异,首先是 header=0
:
In [95]:
import io
import pandas as pd
t="""a,b,c
0,1,2
3,4,5"""
pd.read_csv(io.StringIO(t), header=0)
Out[95]:
a b c
0 0 1 2
1 3 4 5
现在 None
:
In [96]:
pd.read_csv(io.StringIO(t), header=None)
Out[96]:
0 1 2
0 a b c
1 0 1 2
2 3 4 5
请注意,在最新版本 0.19.1
中,这将引发 TypeError
:
In [98]:
pd.read_csv(io.StringIO(t), header=False)
TypeError: Passing a bool to header is invalid. Use header=None for no header or header=int or list-like of ints to specify the row(s) making up the column names
我想你需要参数 header=None
到 read_csv
:
样本:
import pandas as pd
from pandas.compat import StringIO
temp=u"""a,b
2,1
1,1"""
df = pd.read_csv(StringIO(temp),header=None)
print (df)
0 1
0 a b
1 2 1
2 1 1
如果您使用 pd.ExcelFile
阅读所有 excel 文件表,则:
df = pd.ExcelFile("path_to_file.xlsx")
df.sheet_names # Provide the sheet names in the excel file
df = df.parse(2, header=None) # Parsing the 2nd sheet in the file with header = None
df
输出:
0 1
0 a b
1 1 1
2 0 1
3 5 2
您可以设置自定义列名以防止出现这种情况:
假设您的数据集中有两列,那么:
df = pd.read_csv(your_file_path, names = ['first column', 'second column'])
如果您有多个列,您还可以通过编程方式生成列名,并且可以在名称属性前面传递一个列表。