基于列比较两个文件并将共同的元素追加到文件
comparing two files based on a column and append elements in common to a file
基本上,我想将 grepf
的强大功能与 awk
或 'bash' 命令结合起来。我有两个这样的文件:
$file1
ENSG00000000003 TSPAN6 ensembl_havana TSPAN6
ENSG00000000419 DPM1 ensembl_havana DPM1
ENSG00000000457 SCYL3 ensembl_havana SCYL3
ENSG00000000460 C1orf112 ensembl_havana C1orf112
ENSG00000000971 CFH ensembl_havana CFH
ENSG00000001036 FUCA2 ensembl_havana FUCA2
$file2
ENSG00000000003.12 0.0730716237772557 -0.147970450702234
ENSG00000000419.5 0.156405616866614 -0.0398488625782745
ENSG00000000457.3 -0.110396121325736 -0.0147093758392248
ENSG00000000460.15 -0.0457144601264149 0.322340330477282
ENSG00000000971.12 0.0613967504891434 -0.0198254029339757
ENSG00000001036.4 0.00879628204710496 0.0560438506950908
这是我想要的输出
ENSG00000000003.12 TSPAN6 0.0730716237772557 -0.147970450702234
ENSG00000000419.5 DPM1 0.156405616866614 -0.0398488625782745
ENSG00000000457.3 SCYL3 -0.110396121325736 -0.0147093758392248
ENSG00000000460.15 C1orf112 -0.0457144601264149 0.322340330477282
ENSG00000000971.12 CFH 0.0613967504891434 -0.0198254029339757
ENSG00000001036.4 FUCA2 0.00879628204710496 0.0560438506950908
这个输出也很有用
ENSG00000000003 TSPAN6 0.0730716237772557 -0.147970450702234
ENSG00000000419 DPM1 0.156405616866614 -0.0398488625782745
ENSG00000000457 SCYL3 -0.110396121325736 -0.0147093758392248
ENSG00000000460 C1orf112 -0.0457144601264149 0.322340330477282
ENSG00000000971 CFH 0.0613967504891434 -0.0198254029339757
ENSG00000001036 FUCA2 0.00879628204710496 0.0560438506950908
我已经尝试过来自Obtain patterns from a file, compare to a column of another file, print matching lines, using awk
的命令
awk 'NR==FNR{a[[=13=]]=1;next} {n=0;for(i in a){if([=13=]~i){print; break}}} n' file2 file
但显然它没有给我想要的输出
谢谢
使用 awk:
awk 'NR == FNR { a[] = ; next } { split(, b, "."); print , a[b[1]], , }' file1 file2
其工作原理如下:
NR == FNR { # While processing the first file
a[] = # just remember the second field by the first
next
}
{ # while processing the second file
split(, b, ".") # split first field to isolate the key
print , a[b[1]], , # print relevant fields and the remembered
# bit from the first file.
}
$ awk 'NR==FNR{m[]=;next} {sub(/[[:space:]]/," "m[])} 1' file1 FS='.' file2
ENSG00000000003.12 TSPAN6 0.0730716237772557 -0.147970450702234
ENSG00000000419.5 DPM1 0.156405616866614 -0.0398488625782745
ENSG00000000457.3 SCYL3 -0.110396121325736 -0.0147093758392248
ENSG00000000460.15 C1orf112 -0.0457144601264149 0.322340330477282
ENSG00000000971.12 CFH 0.0613967504891434 -0.0198254029339757
ENSG00000001036.4 FUCA2 0.00879628204710496 0.0560438506950908
基本上,我想将 grepf
的强大功能与 awk
或 'bash' 命令结合起来。我有两个这样的文件:
$file1
ENSG00000000003 TSPAN6 ensembl_havana TSPAN6
ENSG00000000419 DPM1 ensembl_havana DPM1
ENSG00000000457 SCYL3 ensembl_havana SCYL3
ENSG00000000460 C1orf112 ensembl_havana C1orf112
ENSG00000000971 CFH ensembl_havana CFH
ENSG00000001036 FUCA2 ensembl_havana FUCA2
$file2
ENSG00000000003.12 0.0730716237772557 -0.147970450702234
ENSG00000000419.5 0.156405616866614 -0.0398488625782745
ENSG00000000457.3 -0.110396121325736 -0.0147093758392248
ENSG00000000460.15 -0.0457144601264149 0.322340330477282
ENSG00000000971.12 0.0613967504891434 -0.0198254029339757
ENSG00000001036.4 0.00879628204710496 0.0560438506950908
这是我想要的输出
ENSG00000000003.12 TSPAN6 0.0730716237772557 -0.147970450702234
ENSG00000000419.5 DPM1 0.156405616866614 -0.0398488625782745
ENSG00000000457.3 SCYL3 -0.110396121325736 -0.0147093758392248
ENSG00000000460.15 C1orf112 -0.0457144601264149 0.322340330477282
ENSG00000000971.12 CFH 0.0613967504891434 -0.0198254029339757
ENSG00000001036.4 FUCA2 0.00879628204710496 0.0560438506950908
这个输出也很有用
ENSG00000000003 TSPAN6 0.0730716237772557 -0.147970450702234
ENSG00000000419 DPM1 0.156405616866614 -0.0398488625782745
ENSG00000000457 SCYL3 -0.110396121325736 -0.0147093758392248
ENSG00000000460 C1orf112 -0.0457144601264149 0.322340330477282
ENSG00000000971 CFH 0.0613967504891434 -0.0198254029339757
ENSG00000001036 FUCA2 0.00879628204710496 0.0560438506950908
我已经尝试过来自Obtain patterns from a file, compare to a column of another file, print matching lines, using awk
的命令awk 'NR==FNR{a[[=13=]]=1;next} {n=0;for(i in a){if([=13=]~i){print; break}}} n' file2 file
但显然它没有给我想要的输出
谢谢
使用 awk:
awk 'NR == FNR { a[] = ; next } { split(, b, "."); print , a[b[1]], , }' file1 file2
其工作原理如下:
NR == FNR { # While processing the first file
a[] = # just remember the second field by the first
next
}
{ # while processing the second file
split(, b, ".") # split first field to isolate the key
print , a[b[1]], , # print relevant fields and the remembered
# bit from the first file.
}
$ awk 'NR==FNR{m[]=;next} {sub(/[[:space:]]/," "m[])} 1' file1 FS='.' file2
ENSG00000000003.12 TSPAN6 0.0730716237772557 -0.147970450702234
ENSG00000000419.5 DPM1 0.156405616866614 -0.0398488625782745
ENSG00000000457.3 SCYL3 -0.110396121325736 -0.0147093758392248
ENSG00000000460.15 C1orf112 -0.0457144601264149 0.322340330477282
ENSG00000000971.12 CFH 0.0613967504891434 -0.0198254029339757
ENSG00000001036.4 FUCA2 0.00879628204710496 0.0560438506950908