读取包含字符串和数字列的 txt 文件
Read txt file with string and number columns
我有制表符分隔的文件(城市-data.txt):
Alabama Montgomery 32.361538 -86.279118
Alaska Juneau 58.301935 -134.41974
是否可以以某种方式将前两列读取为字符串,将后两列读取为浮点数?
我的输出应该是这样的:
[(Alabama,Montgomery,32.36,-86.28),
(Alaska,Juneau,58.30,-134.42)]
我试过了:
mylist2=np.genfromtxt(r'city-data.txt', delimiter='\t', dtype=("<S15","
<S15", float, float)).tolist()
这给了我字节类型的前两列:
[(b'Alabama', b'Montgomery', 32.361538, -86.279118),
(b'Alaska', b'Juneau', 58.301935, -134.41974)]
我也试过:
with open('city-data.txt') as f:
mylist = [tuple(i.strip().split('\t')) for i in f]
这给了我字符串类型的所有列:
[('Alabama', 'Montgomery', '32.361538', '-86.279118'),
('Alaska', 'Juneau', '58.301935', '-134.41974')]
我不知道如何实现我需要的...
您可以使用 pandas read_csv
将文件内容读入数据帧。然后使用 df.values.tolist()
.
将行转换为您指定的列表
示例:
import pandas as pd
df = pd.read_csv(filename, sep="\t", header=None)
print(df.values.tolist())
#[['Alabama', 'Montgomery', 32.361538, -86.27911800000001],
# ['Alaska', 'Juneau', 58.301935, -134.41974]]
如果您需要它们作为元组,只需使用 map()
:
print(map(tuple, df.values.tolist()))
#[('Alabama', 'Montgomery', 32.361538, -86.27911800000001),
# ('Alaska', 'Juneau', 58.301935, -134.41974)]
编辑
如果您想使用 numpy
,对现有代码稍作修改应该可以。将文本字段的 dtype
更改为 "O"
:
mylist2=np.genfromtxt(filename delimiter='\t', dtype=("O","O", float, float)).tolist()
#[('Alabama', 'Montgomery', 32.361538, -86.279118),
# ('Alaska', 'Juneau', 58.301935, -134.41974)]
另一种选择是使用 'U' dtype,它代表 unicode。
>>> import numpy as np
>>> mylist = np.genfromtxt('city-data.txt', delimiter='\t', dtype=('U10','U10',float,float)).tolist()
>>> mylist
[('Alabama', 'Montgomery', 32.361538, -86.279118), ('Alaska', 'Juneau', 58.301935, -134.41974)]
拆分一行后,通过尝试将项目转换为浮点数来创建新行,然后将新行附加到最终容器。
import io
from pprint import pprint
s = '''Alabama Montgomery 32.361538 -86.279118
Alaska Juneau 58.301935 -134.41974'''
f = io.StringIO(s)
stuff = []
for line in f:
line = line.strip()
line = line.split()
new_line = []
for item in line:
try:
item = float(item)
except ValueError as e:
pass
new_line.append(item)
#print(f'line:{line}, new_line:{new_line}')
stuff.append(new_line)
pprint(stuff)
我有制表符分隔的文件(城市-data.txt):
Alabama Montgomery 32.361538 -86.279118
Alaska Juneau 58.301935 -134.41974
是否可以以某种方式将前两列读取为字符串,将后两列读取为浮点数?
我的输出应该是这样的:
[(Alabama,Montgomery,32.36,-86.28),
(Alaska,Juneau,58.30,-134.42)]
我试过了:
mylist2=np.genfromtxt(r'city-data.txt', delimiter='\t', dtype=("<S15","
<S15", float, float)).tolist()
这给了我字节类型的前两列:
[(b'Alabama', b'Montgomery', 32.361538, -86.279118),
(b'Alaska', b'Juneau', 58.301935, -134.41974)]
我也试过:
with open('city-data.txt') as f:
mylist = [tuple(i.strip().split('\t')) for i in f]
这给了我字符串类型的所有列:
[('Alabama', 'Montgomery', '32.361538', '-86.279118'),
('Alaska', 'Juneau', '58.301935', '-134.41974')]
我不知道如何实现我需要的...
您可以使用 pandas read_csv
将文件内容读入数据帧。然后使用 df.values.tolist()
.
示例:
import pandas as pd
df = pd.read_csv(filename, sep="\t", header=None)
print(df.values.tolist())
#[['Alabama', 'Montgomery', 32.361538, -86.27911800000001],
# ['Alaska', 'Juneau', 58.301935, -134.41974]]
如果您需要它们作为元组,只需使用 map()
:
print(map(tuple, df.values.tolist()))
#[('Alabama', 'Montgomery', 32.361538, -86.27911800000001),
# ('Alaska', 'Juneau', 58.301935, -134.41974)]
编辑
如果您想使用 numpy
,对现有代码稍作修改应该可以。将文本字段的 dtype
更改为 "O"
:
mylist2=np.genfromtxt(filename delimiter='\t', dtype=("O","O", float, float)).tolist()
#[('Alabama', 'Montgomery', 32.361538, -86.279118),
# ('Alaska', 'Juneau', 58.301935, -134.41974)]
另一种选择是使用 'U' dtype,它代表 unicode。
>>> import numpy as np
>>> mylist = np.genfromtxt('city-data.txt', delimiter='\t', dtype=('U10','U10',float,float)).tolist()
>>> mylist
[('Alabama', 'Montgomery', 32.361538, -86.279118), ('Alaska', 'Juneau', 58.301935, -134.41974)]
拆分一行后,通过尝试将项目转换为浮点数来创建新行,然后将新行附加到最终容器。
import io
from pprint import pprint
s = '''Alabama Montgomery 32.361538 -86.279118
Alaska Juneau 58.301935 -134.41974'''
f = io.StringIO(s)
stuff = []
for line in f:
line = line.strip()
line = line.split()
new_line = []
for item in line:
try:
item = float(item)
except ValueError as e:
pass
new_line.append(item)
#print(f'line:{line}, new_line:{new_line}')
stuff.append(new_line)
pprint(stuff)