从文件中的行中提取字符串
extract string from line in a file
我有两个文件,其中一个包含像
这样的行
0 rho is 2313.22
1 rho is 6456.01
.....
18811 rho is 2154.78
18812 rho is 2279.565
18813 rho is 1813.690
18814 rho is 346.20664
第二个文件包含一些没有按顺序排列的数字,如
18812
758
2623
12569
1392
我需要从文件 1 中提取它的 rho 值。我试图比较两个文件,如果它发现数字存在,它应该 return rho 值但不能做这部分
with open('file1', 'r') as file1:
with open('file2', 'r') as file2:
same = set(file1).intersection(file2)
same.discard('\n')
with open('results.txt', 'w') as file_out:
for line in same:
file_out.write(line)
这就是你如何使用 pandas
:
import pandas as pd
#load file1 as csv, split on whitespace, name columns and drop redundant text columns
df1 = pd.read_csv('file1.txt', sep='\s+', names=['id', 0, 1, 'value']).drop(columns=[0, 1])
#load file2 as csv, name column
df2 = pd.read_csv('file2.txt', names=['id'])
#merge dataframes, keep only values that exist in df2 and write output to csv file
df2.merge(df1, on='id').to_csv('output.csv', index=False)
您可以采用更“数据工程”的方法,使用 pandas 打开 2 个 csv 文件,然后进行合并。
示例代码:
import pandas as pd
# read the first file as a csv file, considering "rho is" as the separator
rho_map = pd.read_csv('file1', sep="rho is",
header=None, names=['id', 'rho',])
# read the second file
data = pd.read_csv('file2', names=['id'])
# Then merge
results = data.merge(rho_map, on='id')
使用测试数据的子集,您可以使 file1 具有:
18811 rho is 2154.78
18812 rho is 2279.565
18813 rho is 1813.690
18814 rho is 346.20664
和文件 2
18812
758
2623
12569
1392
这将给出结果:
id rho
0 18812 2279.565
我有两个文件,其中一个包含像
这样的行 0 rho is 2313.22
1 rho is 6456.01
.....
18811 rho is 2154.78
18812 rho is 2279.565
18813 rho is 1813.690
18814 rho is 346.20664
第二个文件包含一些没有按顺序排列的数字,如
18812
758
2623
12569
1392
我需要从文件 1 中提取它的 rho 值。我试图比较两个文件,如果它发现数字存在,它应该 return rho 值但不能做这部分
with open('file1', 'r') as file1:
with open('file2', 'r') as file2:
same = set(file1).intersection(file2)
same.discard('\n')
with open('results.txt', 'w') as file_out:
for line in same:
file_out.write(line)
这就是你如何使用 pandas
:
import pandas as pd
#load file1 as csv, split on whitespace, name columns and drop redundant text columns
df1 = pd.read_csv('file1.txt', sep='\s+', names=['id', 0, 1, 'value']).drop(columns=[0, 1])
#load file2 as csv, name column
df2 = pd.read_csv('file2.txt', names=['id'])
#merge dataframes, keep only values that exist in df2 and write output to csv file
df2.merge(df1, on='id').to_csv('output.csv', index=False)
您可以采用更“数据工程”的方法,使用 pandas 打开 2 个 csv 文件,然后进行合并。
示例代码:
import pandas as pd
# read the first file as a csv file, considering "rho is" as the separator
rho_map = pd.read_csv('file1', sep="rho is",
header=None, names=['id', 'rho',])
# read the second file
data = pd.read_csv('file2', names=['id'])
# Then merge
results = data.merge(rho_map, on='id')
使用测试数据的子集,您可以使 file1 具有:
18811 rho is 2154.78
18812 rho is 2279.565
18813 rho is 1813.690
18814 rho is 346.20664
和文件 2
18812
758
2623
12569
1392
这将给出结果:
id rho
0 18812 2279.565