从文件中的行中提取字符串

Question

我有两个文件，其中一个包含像

这样的行

 0    rho is 2313.22
 1    rho is 6456.01
 .....
 18811 rho is 2154.78
 18812 rho is 2279.565
 18813 rho is 1813.690
 18814 rho is 346.20664

第二个文件包含一些没有按顺序排列的数字，如

我需要从文件 1 中提取它的 rho 值。我试图比较两个文件，如果它发现数字存在，它应该 return rho 值但不能做这部分

with open('file1', 'r') as file1:
    with open('file2', 'r') as file2:
        same = set(file1).intersection(file2)

same.discard('\n')

with open('results.txt', 'w') as file_out:
    for line in same:
        file_out.write(line)

Answer 1

这就是你如何使用 pandas:

import pandas as pd

#load file1 as csv, split on whitespace, name columns and drop redundant text columns
df1 = pd.read_csv('file1.txt', sep='\s+', names=['id', 0, 1, 'value']).drop(columns=[0, 1])

#load file2 as csv, name column
df2 = pd.read_csv('file2.txt', names=['id'])

#merge dataframes, keep only values that exist in df2 and write output to csv file
df2.merge(df1, on='id').to_csv('output.csv', index=False)

Answer 2

您可以采用更“数据工程”的方法，使用 pandas 打开 2 个 csv 文件，然后进行合并。

示例代码：

import pandas as pd

# read the first file as a csv file, considering "rho is" as the separator 
rho_map = pd.read_csv('file1', sep="rho is", 
                     header=None, names=['id', 'rho',])

# read the second file
data = pd.read_csv('file2', names=['id'])

# Then merge
results = data.merge(rho_map, on='id')

使用测试数据的子集，您可以使 file1 具有：

 18811 rho is 2154.78
 18812 rho is 2279.565
 18813 rho is 1813.690
 18814 rho is 346.20664

和文件 2

这将给出结果：

    id  rho
0   18812   2279.565

从文件中的行中提取字符串

extract string from line in a file

python

file