在 Python 中从一个文件到另一个文件提取数据

Question

我想从第一个文件中提取数据到第二个文件，并将它们放入特定的标签中。第一个文件如下所示：

"city1" : [[1.1,1.2],[2.1,2.2],[3.1,3.2]],
"city2" : [[5.0,0.2],[4.1,3.2],[7.1,8.2]],
...

所以类型就像字典，其中值是列表的列表

不幸的是，在打开文件时出现错误：要解压的值太多

我正在尝试打开：

lines = {}
with open("shape.txt", "r") as f:
for line in f:
    (key,val) = line.split()
    d[key] = val

之后我想将这个城市和坐标提取到结构如下的第二个文件中：

<state name = 'city1'>
 <point lat='first value from first list', lng='second value from first list/>
 <point lat='first value from second list', lng='second value from second list/>
</state>
<state name = 'city2'>
 the same action like above

我在想是否还有其他解决方案？

Answer 1

您可以使用它轻松加载和保存列表和词典。

import json

data = {
  'city1': [[1.1,1.2],[2.1,2.2],[3.1,3.2]],
  'city2': [[5.0,0.2],[4.1,3.2],[7.1,8.2]],
}

with open('file.txt', 'w') as f:
   f.write(json.dumps(data))

with open('file.txt') as f:
   data = json.loads(f.read())

但只有当文件有效 json 时它才会起作用。您的文件几乎有效 JSON 除了它没有花括号

所以我认为可以这样做：

lines = ['{']
with open('file') as f:
for line in f:
    lines.append(line)
lines[-1].strip(',')  # Strip last comma as it's not valid for JSON dict
lines.append('}')
data = json.loads('\n'.join(lines))

然后就这样做：

result = ''
for city, points in data.items():
    result += f"<state name='{city}'>\n"
    for point in points:
        result += f"<point lat='{point[0]}', lng='{point[1]}'/>\n"
    result += '</state>\n'

 with open('out.txt', 'w') as f:
     f.write(result)

Answer 2

如果您的文本文件只包含您提到的这种结构，您应该能够使用 ast.literal_eval 来解析数据：

txt = '''

"city1" : [[1.1,1.2],[2.1,2.2],[3.1,3.2]],
"city2" : [[5.0,0.2],[4.1,3.2],[7.1,8.2]],

'''

template = '''<state name = '{city}'>
  <point lat='{vals[0][0]}', lng='{vals[0][1]}' />
  <point lat='{vals[1][0]}', lng='{vals[1][1]}' />
</state>'''

from ast import literal_eval

data = literal_eval('{' + txt + '}')

print(data)

for k, v in data.items():
    print(template.format(city=k, vals=v))

打印：

<state name = 'city1'>
  <point lat='1.1', lng='1.2' />
  <point lat='2.1', lng='2.2' />
</state>
<state name = 'city2'>
  <point lat='5.0', lng='0.2' />
  <point lat='4.1', lng='3.2' />
</state>

有文件 I/O:

template = '''<state name = '{city}'>
  <point lat='{vals[0][0]}', lng='{vals[0][1]}' />
  <point lat='{vals[1][0]}', lng='{vals[1][1]}' />
</state>'''

from ast import literal_eval

with open('sample.txt', 'r') as f_in, open('sample.out', 'w') as f_out:
    data = literal_eval('{' + f_in.read() + '}')

    for k, v in data.items():
        print(template.format(city=k, vals=v), file=f_out)

编辑：此示例会将所有点打印到文件中：

from ast import literal_eval

with open('sample.txt', 'r') as f_in, open('sample.out', 'w') as f_out:
    data = literal_eval('{' + f_in.read() + '}')

    for k, v in data.items():
        print("<state name = '{city}'>".format(city=k), file=f_out)
        for point in v:
            print("\t<point lat='{point[0]}', lng='{point[1]}' />".format(point=point), file=f_out)
        print('</state>', file=f_out)

文件 sample.out 将如下所示：

<state name = 'city1'>
    <point lat='1.1', lng='1.2' />
    <point lat='2.1', lng='2.2' />
    <point lat='3.1', lng='3.2' />
</state>
<state name = 'city2'>
    <point lat='5.0', lng='0.2' />
    <point lat='4.1', lng='3.2' />
    <point lat='7.1', lng='8.2' />
</state>

Answer 3

这是另一种直接写入第二个文件的方法。这样你就不需要先存储它dict。如果您处理一些大文件，性能会更高：

with open("shape.txt", "r") as f1:
    with open("shape_output.txt", "w") as f2:
        for line in f1:
            (key, val) = line.split(":")
            coord = json.loads(val.strip().rstrip(','))
            city = key.replace('"', '').strip()

            new_line = f"<state name = '{city}'>"
            for c in coord:
                new_line += f"<point lat='{c[0]}', lng='{c[1]}'/>"
            new_line += "</state>"

            f2.write(new_line)
            f2.write("\n")

我在从第一个文件读取行时添加了一些清理。然后使用 json.loads 将数组从字符串格式转换为数组类型。

剩下的只是格式化和写入第二个文件。

在 Python 中从一个文件到另一个文件提取数据

Extracting data from file to file in Python

python

extract