按数据集中的外观对 cross-table 的列和行进行排序
Ordering columns and rows of cross-table by appearance in dataset
下面,我有以下按变量 mutation
排序的数据。我正在使用 xtabs
根据 attribute
的值通过因子变量 subregion
(列)对因子变量 ID
(行)进行交叉制表。为了按照它们在数据集中出现的顺序对因子重新排序,我使用 forcats::fct_reorder
。基于 (t运行cated) 输出,这适用于下面给出的非常小的数据集。但是,我的数据有 966 个唯一 ID
和 ~58000 个唯一 subregions
。当我 运行 类似下面的代码时,它没有按正确的顺序给出列和行。
在数据集中出现较早的个体可能会在稍后重复出现,因为它们在较晚的位置发生突变,但子区域不会出现在数据集中较晚的位置,因为它们是由它们的位置决定的。此外,对于重复的 ID
和 subregion
(参见 ID
s 4,5 和 21,22)将导致单元格的 sum 为他们的 attribute
值。无论如何修改 xtabs
以保留所需的顺序?我也对其他交叉制表方法持开放态度。
代码
library(forcats)
#Cross-tabulate ID by subregion using attribute
df_tab <- as.data.frame.matrix(xtabs(df$attribute~fct_reorder(as.character(df$ID),df$mutation)+
fct_reorder(as.character(df$subregion),df$mutation)))
T运行分类输出
OR4F5:E1:E1 SAMD11:E2:E2 NOC2L:E5:E5 NOC2L:E4:E4 KLHL17:E3:E3
TCGA-AN-A046 1.0085 0.000000 0.000000 0.000000 0.00000
TCGA-A2-A0CP 0.0000 1.003465 0.000000 0.000000 0.00000
TCGA-A8-A08H 0.0000 0.000000 1.436694 0.000000 0.00000
TCGA-GM-A2DM 0.0000 0.000000 0.000000 2.335915 0.00000
TCGA-D8-A1XM 0.0000 0.000000 0.000000 0.000000 2.17849
数据集
"ID" "subregion" "mutation" "attribute"
"1" "TCGA-AN-A046" "OR4F5:E1:E1" 69767 1.00849961637455
"2" "TCGA-A2-A0CP" "SAMD11:E2:E2" 925952 1.00346517231111
"3" "TCGA-A8-A08H" "NOC2L:E5:E5" 956126 1.43669428919156
"4" "TCGA-GM-A2DM" "NOC2L:E4:E4" 956911 1.1679575001733
"5" "TCGA-GM-A2DM" "NOC2L:E4:E4" 956912 1.1679575001733
"6" "TCGA-D8-A1XM" "KLHL17:E3:E3" 961658 2.17848956802821
"7" "TCGA-BH-A18G" "KLHL17:E5:E5" 962441 48.0640560165975
"8" "TCGA-3C-AALI" "KLHL17:E8:E8" 963353 40.6525553849528
"9" "TCGA-AC-A62Y" "KLHL17:E9:E9" 964004 2.89875813313313
"10" "TCGA-AR-A2LE" "PLEKHN1:E1:E1" 966556 1.03540263019699
"11" "TCGA-E2-A14N" "PLEKHN1:E5:E5" 970728 21.8246585021196
"12" "TCGA-AO-A0J4" "PLEKHN1:E12:E12" 973506 1.24409284966302
"13" "TCGA-D8-A1J9" "HES4:E3:E3" 999551 1.24409284966302
"14" "TCGA-EW-A1PH" "ISG15:E2:E2" 1014276 72.4814235432147
"15" "TCGA-A2-A0T0" "AGRN:E2:E2" 1022338 21.8246585021196
"16" "TCGA-GM-A2DD" "AGRN:E3:E3" 1035303 1.06314569745364
"17" "TCGA-5L-AAT1" "AGRN:E4:E4" 1040690 1.24409284966302
"18" "TCGA-OL-A5RW" "AGRN:E8:E8" 1043314 2.20878819659627
"19" "TCGA-D8-A27M" "AGRN:E25:E25" 1049355 1.45844645372491
"20" "TCGA-AR-A1AI" "AGRN:E29:E29" 1050430 1.16479379564338
"21" "TCGA-5L-AAT0" "AGRN:E36:E36" 1055374 7.09932582548073
"22" "TCGA-5L-AAT0" "AGRN:E36:E36" 1055376 7.09932582548073
"23" "TCGA-C8-A8HP" "AGRN:E36:E36" 1055442 7.09932582548073
"24" "TCGA-A7-A4SD" "TTLL10:E13:E13" 1184971 1.24409284966302
"25" "TCGA-BH-A1F0" "SDF4:E4:E4" 1223283 1.46091816304331
"26" "TCGA-AO-A128" "SDF4:E4:E4" 1223330 1.46091816304331
"27" "TCGA-E9-A1R0" "SDF4:E2:E2" 1228592 3.86565576505924
"28" "TCGA-A2-A04P" "UBE2J2:E7:E7" 1255246 33.795587162655
"29" "TCGA-C8-A274" "UBE2J2:E7:E7" 1255342 33.795587162655
"30" "TCGA-5L-AAT1" "SCNN1D:E1:E1" 1281422 1.24409284966302
"31" "TCGA-AO-A128" "SCNN1D:E6:E6" 1287116 1.06314569745364
"32" "TCGA-E2-A15R" "SCNN1D:E7:E7" 1287596 2.89179279138711
"33" "TCGA-AC-A62V" "SCNN1D:E11:E11" 1290543 74.0747402078337
"34" "TCGA-BH-A18V" "ACAP3:E22:E22" 1294187 2.21398621447599
这样的事情怎么样?
library(dplyr)
library(tidyr)
df.wide <- df %>%
mutate(
ID = factor(ID, levels = unique(.$ID[order(.$mutation)])),
subregion = factor(subregion, levels = unique(.$subregion[order(.$mutation)]))) %>%
group_by(ID, subregion) %>%
mutate(n = 1:n()) %>%
select(-mutation) %>%
spread(subregion, attribute) %>%
ungroup()
df.wide
## A tibble: 32 x 31
# ID n `OR4F5:E1:E1` `SAMD11:E2:E2` `NOC2L:E5:E5` `NOC2L:E4:E4`
# <fct> <int> <dbl> <dbl> <dbl> <dbl>
# 1 TCGA… 1 1.01 NA NA NA
# 2 TCGA… 1 NA 1.00 NA NA
# 3 TCGA… 1 NA NA 1.44 NA
# 4 TCGA… 1 NA NA NA 1.17
# 5 TCGA… 2 NA NA NA 1.17
# 6 TCGA… 1 NA NA NA NA
# 7 TCGA… 1 NA NA NA NA
# 8 TCGA… 1 NA NA NA NA
# 9 TCGA… 1 NA NA NA NA
#10 TCGA… 1 NA NA NA NA
## … with 22 more rows, and 25 more variables: `KLHL17:E3:E3` <dbl>,
## `KLHL17:E5:E5` <dbl>, `KLHL17:E8:E8` <dbl>, `KLHL17:E9:E9` <dbl>,
## `PLEKHN1:E1:E1` <dbl>, `PLEKHN1:E5:E5` <dbl>, `PLEKHN1:E12:E12` <dbl>,
## `HES4:E3:E3` <dbl>, `ISG15:E2:E2` <dbl>, `AGRN:E2:E2` <dbl>,
## `AGRN:E3:E3` <dbl>, `AGRN:E4:E4` <dbl>, `AGRN:E8:E8` <dbl>,
## `AGRN:E25:E25` <dbl>, `AGRN:E29:E29` <dbl>, `AGRN:E36:E36` <dbl>,
## `TTLL10:E13:E13` <dbl>, `SDF4:E4:E4` <dbl>, `SDF4:E2:E2` <dbl>,
## `UBE2J2:E7:E7` <dbl>, `SCNN1D:E1:E1` <dbl>, `SCNN1D:E6:E6` <dbl>,
## `SCNN1D:E7:E7` <dbl>, `SCNN1D:E11:E11` <dbl>, `ACAP3:E22:E22` <dbl>
我们通过 mutation
明确地为 ID
和 subregion
排序 factor
级别,并添加一个 n
列来跟踪重复的 [=14] =]+subregion
行。剩下的就是一个简单的从长到宽的整形。
更新
对重复的 ID
+subregion
值求和 attribute
值会稍微改变您的问题陈述;在那种情况下你可以做
df.wide <- df %>%
mutate(
ID = factor(ID, levels = unique(.$ID[order(.$mutation)])),
subregion = factor(subregion, levels = unique(.$subregion[order(.$mutation)]))) %>%
group_by(ID, subregion) %>%
summarise(attribute = sum(attribute)) %>%
spread(subregion, attribute) %>%
ungroup()
df.wide
## A tibble: 30 x 30
# ID `OR4F5:E1:E1` `SAMD11:E2:E2` `NOC2L:E5:E5` `NOC2L:E4:E4` `KLHL17:E3:E3`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 TCGA… 1.01 NA NA NA NA
# 2 TCGA… NA 1.00 NA NA NA
# 3 TCGA… NA NA 1.44 NA NA
# 4 TCGA… NA NA NA 2.34 NA
# 5 TCGA… NA NA NA NA 2.18
# 6 TCGA… NA NA NA NA NA
# 7 TCGA… NA NA NA NA NA
# 8 TCGA… NA NA NA NA NA
# 9 TCGA… NA NA NA NA NA
#10 TCGA… NA NA NA NA NA
## … with 20 more rows, and 24 more variables: `KLHL17:E5:E5` <dbl>,
## `KLHL17:E8:E8` <dbl>, `KLHL17:E9:E9` <dbl>, `PLEKHN1:E1:E1` <dbl>,
## `PLEKHN1:E5:E5` <dbl>, `PLEKHN1:E12:E12` <dbl>, `HES4:E3:E3` <dbl>,
## `ISG15:E2:E2` <dbl>, `AGRN:E2:E2` <dbl>, `AGRN:E3:E3` <dbl>,
## `AGRN:E4:E4` <dbl>, `AGRN:E8:E8` <dbl>, `AGRN:E25:E25` <dbl>,
## `AGRN:E29:E29` <dbl>, `AGRN:E36:E36` <dbl>, `TTLL10:E13:E13` <dbl>,
## `SDF4:E4:E4` <dbl>, `SDF4:E2:E2` <dbl>, `UBE2J2:E7:E7` <dbl>,
## `SCNN1D:E1:E1` <dbl>, `SCNN1D:E6:E6` <dbl>, `SCNN1D:E7:E7` <dbl>,
## `SCNN1D:E11:E11` <dbl>, `ACAP3:E22:E22` <dbl>
示例数据
df <- read.table(text =
'"ID" "subregion" "mutation" "attribute"
"1" "TCGA-AN-A046" "OR4F5:E1:E1" 69767 1.00849961637455
"2" "TCGA-A2-A0CP" "SAMD11:E2:E2" 925952 1.00346517231111
"3" "TCGA-A8-A08H" "NOC2L:E5:E5" 956126 1.43669428919156
"4" "TCGA-GM-A2DM" "NOC2L:E4:E4" 956911 1.1679575001733
"5" "TCGA-GM-A2DM" "NOC2L:E4:E4" 956912 1.1679575001733
"6" "TCGA-D8-A1XM" "KLHL17:E3:E3" 961658 2.17848956802821
"7" "TCGA-BH-A18G" "KLHL17:E5:E5" 962441 48.0640560165975
"8" "TCGA-3C-AALI" "KLHL17:E8:E8" 963353 40.6525553849528
"9" "TCGA-AC-A62Y" "KLHL17:E9:E9" 964004 2.89875813313313
"10" "TCGA-AR-A2LE" "PLEKHN1:E1:E1" 966556 1.03540263019699
"11" "TCGA-E2-A14N" "PLEKHN1:E5:E5" 970728 21.8246585021196
"12" "TCGA-AO-A0J4" "PLEKHN1:E12:E12" 973506 1.24409284966302
"13" "TCGA-D8-A1J9" "HES4:E3:E3" 999551 1.24409284966302
"14" "TCGA-EW-A1PH" "ISG15:E2:E2" 1014276 72.4814235432147
"15" "TCGA-A2-A0T0" "AGRN:E2:E2" 1022338 21.8246585021196
"16" "TCGA-GM-A2DD" "AGRN:E3:E3" 1035303 1.06314569745364
"17" "TCGA-5L-AAT1" "AGRN:E4:E4" 1040690 1.24409284966302
"18" "TCGA-OL-A5RW" "AGRN:E8:E8" 1043314 2.20878819659627
"19" "TCGA-D8-A27M" "AGRN:E25:E25" 1049355 1.45844645372491
"20" "TCGA-AR-A1AI" "AGRN:E29:E29" 1050430 1.16479379564338
"21" "TCGA-5L-AAT0" "AGRN:E36:E36" 1055374 7.09932582548073
"22" "TCGA-5L-AAT0" "AGRN:E36:E36" 1055376 7.09932582548073
"23" "TCGA-C8-A8HP" "AGRN:E36:E36" 1055442 7.09932582548073
"24" "TCGA-A7-A4SD" "TTLL10:E13:E13" 1184971 1.24409284966302
"25" "TCGA-BH-A1F0" "SDF4:E4:E4" 1223283 1.46091816304331
"26" "TCGA-AO-A128" "SDF4:E4:E4" 1223330 1.46091816304331
"27" "TCGA-E9-A1R0" "SDF4:E2:E2" 1228592 3.86565576505924
"28" "TCGA-A2-A04P" "UBE2J2:E7:E7" 1255246 33.795587162655
"29" "TCGA-C8-A274" "UBE2J2:E7:E7" 1255342 33.795587162655
"30" "TCGA-5L-AAT1" "SCNN1D:E1:E1" 1281422 1.24409284966302
"31" "TCGA-AO-A128" "SCNN1D:E6:E6" 1287116 1.06314569745364
"32" "TCGA-E2-A15R" "SCNN1D:E7:E7" 1287596 2.89179279138711
"33" "TCGA-AC-A62V" "SCNN1D:E11:E11" 1290543 74.0747402078337
"34" "TCGA-BH-A18V" "ACAP3:E22:E22" 1294187 2.21398621447599', header = T)
下面,我有以下按变量 mutation
排序的数据。我正在使用 xtabs
根据 attribute
的值通过因子变量 subregion
(列)对因子变量 ID
(行)进行交叉制表。为了按照它们在数据集中出现的顺序对因子重新排序,我使用 forcats::fct_reorder
。基于 (t运行cated) 输出,这适用于下面给出的非常小的数据集。但是,我的数据有 966 个唯一 ID
和 ~58000 个唯一 subregions
。当我 运行 类似下面的代码时,它没有按正确的顺序给出列和行。
在数据集中出现较早的个体可能会在稍后重复出现,因为它们在较晚的位置发生突变,但子区域不会出现在数据集中较晚的位置,因为它们是由它们的位置决定的。此外,对于重复的 ID
和 subregion
(参见 ID
s 4,5 和 21,22)将导致单元格的 sum 为他们的 attribute
值。无论如何修改 xtabs
以保留所需的顺序?我也对其他交叉制表方法持开放态度。
代码
library(forcats)
#Cross-tabulate ID by subregion using attribute
df_tab <- as.data.frame.matrix(xtabs(df$attribute~fct_reorder(as.character(df$ID),df$mutation)+
fct_reorder(as.character(df$subregion),df$mutation)))
T运行分类输出
OR4F5:E1:E1 SAMD11:E2:E2 NOC2L:E5:E5 NOC2L:E4:E4 KLHL17:E3:E3
TCGA-AN-A046 1.0085 0.000000 0.000000 0.000000 0.00000
TCGA-A2-A0CP 0.0000 1.003465 0.000000 0.000000 0.00000
TCGA-A8-A08H 0.0000 0.000000 1.436694 0.000000 0.00000
TCGA-GM-A2DM 0.0000 0.000000 0.000000 2.335915 0.00000
TCGA-D8-A1XM 0.0000 0.000000 0.000000 0.000000 2.17849
数据集
"ID" "subregion" "mutation" "attribute"
"1" "TCGA-AN-A046" "OR4F5:E1:E1" 69767 1.00849961637455
"2" "TCGA-A2-A0CP" "SAMD11:E2:E2" 925952 1.00346517231111
"3" "TCGA-A8-A08H" "NOC2L:E5:E5" 956126 1.43669428919156
"4" "TCGA-GM-A2DM" "NOC2L:E4:E4" 956911 1.1679575001733
"5" "TCGA-GM-A2DM" "NOC2L:E4:E4" 956912 1.1679575001733
"6" "TCGA-D8-A1XM" "KLHL17:E3:E3" 961658 2.17848956802821
"7" "TCGA-BH-A18G" "KLHL17:E5:E5" 962441 48.0640560165975
"8" "TCGA-3C-AALI" "KLHL17:E8:E8" 963353 40.6525553849528
"9" "TCGA-AC-A62Y" "KLHL17:E9:E9" 964004 2.89875813313313
"10" "TCGA-AR-A2LE" "PLEKHN1:E1:E1" 966556 1.03540263019699
"11" "TCGA-E2-A14N" "PLEKHN1:E5:E5" 970728 21.8246585021196
"12" "TCGA-AO-A0J4" "PLEKHN1:E12:E12" 973506 1.24409284966302
"13" "TCGA-D8-A1J9" "HES4:E3:E3" 999551 1.24409284966302
"14" "TCGA-EW-A1PH" "ISG15:E2:E2" 1014276 72.4814235432147
"15" "TCGA-A2-A0T0" "AGRN:E2:E2" 1022338 21.8246585021196
"16" "TCGA-GM-A2DD" "AGRN:E3:E3" 1035303 1.06314569745364
"17" "TCGA-5L-AAT1" "AGRN:E4:E4" 1040690 1.24409284966302
"18" "TCGA-OL-A5RW" "AGRN:E8:E8" 1043314 2.20878819659627
"19" "TCGA-D8-A27M" "AGRN:E25:E25" 1049355 1.45844645372491
"20" "TCGA-AR-A1AI" "AGRN:E29:E29" 1050430 1.16479379564338
"21" "TCGA-5L-AAT0" "AGRN:E36:E36" 1055374 7.09932582548073
"22" "TCGA-5L-AAT0" "AGRN:E36:E36" 1055376 7.09932582548073
"23" "TCGA-C8-A8HP" "AGRN:E36:E36" 1055442 7.09932582548073
"24" "TCGA-A7-A4SD" "TTLL10:E13:E13" 1184971 1.24409284966302
"25" "TCGA-BH-A1F0" "SDF4:E4:E4" 1223283 1.46091816304331
"26" "TCGA-AO-A128" "SDF4:E4:E4" 1223330 1.46091816304331
"27" "TCGA-E9-A1R0" "SDF4:E2:E2" 1228592 3.86565576505924
"28" "TCGA-A2-A04P" "UBE2J2:E7:E7" 1255246 33.795587162655
"29" "TCGA-C8-A274" "UBE2J2:E7:E7" 1255342 33.795587162655
"30" "TCGA-5L-AAT1" "SCNN1D:E1:E1" 1281422 1.24409284966302
"31" "TCGA-AO-A128" "SCNN1D:E6:E6" 1287116 1.06314569745364
"32" "TCGA-E2-A15R" "SCNN1D:E7:E7" 1287596 2.89179279138711
"33" "TCGA-AC-A62V" "SCNN1D:E11:E11" 1290543 74.0747402078337
"34" "TCGA-BH-A18V" "ACAP3:E22:E22" 1294187 2.21398621447599
这样的事情怎么样?
library(dplyr)
library(tidyr)
df.wide <- df %>%
mutate(
ID = factor(ID, levels = unique(.$ID[order(.$mutation)])),
subregion = factor(subregion, levels = unique(.$subregion[order(.$mutation)]))) %>%
group_by(ID, subregion) %>%
mutate(n = 1:n()) %>%
select(-mutation) %>%
spread(subregion, attribute) %>%
ungroup()
df.wide
## A tibble: 32 x 31
# ID n `OR4F5:E1:E1` `SAMD11:E2:E2` `NOC2L:E5:E5` `NOC2L:E4:E4`
# <fct> <int> <dbl> <dbl> <dbl> <dbl>
# 1 TCGA… 1 1.01 NA NA NA
# 2 TCGA… 1 NA 1.00 NA NA
# 3 TCGA… 1 NA NA 1.44 NA
# 4 TCGA… 1 NA NA NA 1.17
# 5 TCGA… 2 NA NA NA 1.17
# 6 TCGA… 1 NA NA NA NA
# 7 TCGA… 1 NA NA NA NA
# 8 TCGA… 1 NA NA NA NA
# 9 TCGA… 1 NA NA NA NA
#10 TCGA… 1 NA NA NA NA
## … with 22 more rows, and 25 more variables: `KLHL17:E3:E3` <dbl>,
## `KLHL17:E5:E5` <dbl>, `KLHL17:E8:E8` <dbl>, `KLHL17:E9:E9` <dbl>,
## `PLEKHN1:E1:E1` <dbl>, `PLEKHN1:E5:E5` <dbl>, `PLEKHN1:E12:E12` <dbl>,
## `HES4:E3:E3` <dbl>, `ISG15:E2:E2` <dbl>, `AGRN:E2:E2` <dbl>,
## `AGRN:E3:E3` <dbl>, `AGRN:E4:E4` <dbl>, `AGRN:E8:E8` <dbl>,
## `AGRN:E25:E25` <dbl>, `AGRN:E29:E29` <dbl>, `AGRN:E36:E36` <dbl>,
## `TTLL10:E13:E13` <dbl>, `SDF4:E4:E4` <dbl>, `SDF4:E2:E2` <dbl>,
## `UBE2J2:E7:E7` <dbl>, `SCNN1D:E1:E1` <dbl>, `SCNN1D:E6:E6` <dbl>,
## `SCNN1D:E7:E7` <dbl>, `SCNN1D:E11:E11` <dbl>, `ACAP3:E22:E22` <dbl>
我们通过 mutation
明确地为 ID
和 subregion
排序 factor
级别,并添加一个 n
列来跟踪重复的 [=14] =]+subregion
行。剩下的就是一个简单的从长到宽的整形。
更新
对重复的 ID
+subregion
值求和 attribute
值会稍微改变您的问题陈述;在那种情况下你可以做
df.wide <- df %>%
mutate(
ID = factor(ID, levels = unique(.$ID[order(.$mutation)])),
subregion = factor(subregion, levels = unique(.$subregion[order(.$mutation)]))) %>%
group_by(ID, subregion) %>%
summarise(attribute = sum(attribute)) %>%
spread(subregion, attribute) %>%
ungroup()
df.wide
## A tibble: 30 x 30
# ID `OR4F5:E1:E1` `SAMD11:E2:E2` `NOC2L:E5:E5` `NOC2L:E4:E4` `KLHL17:E3:E3`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 TCGA… 1.01 NA NA NA NA
# 2 TCGA… NA 1.00 NA NA NA
# 3 TCGA… NA NA 1.44 NA NA
# 4 TCGA… NA NA NA 2.34 NA
# 5 TCGA… NA NA NA NA 2.18
# 6 TCGA… NA NA NA NA NA
# 7 TCGA… NA NA NA NA NA
# 8 TCGA… NA NA NA NA NA
# 9 TCGA… NA NA NA NA NA
#10 TCGA… NA NA NA NA NA
## … with 20 more rows, and 24 more variables: `KLHL17:E5:E5` <dbl>,
## `KLHL17:E8:E8` <dbl>, `KLHL17:E9:E9` <dbl>, `PLEKHN1:E1:E1` <dbl>,
## `PLEKHN1:E5:E5` <dbl>, `PLEKHN1:E12:E12` <dbl>, `HES4:E3:E3` <dbl>,
## `ISG15:E2:E2` <dbl>, `AGRN:E2:E2` <dbl>, `AGRN:E3:E3` <dbl>,
## `AGRN:E4:E4` <dbl>, `AGRN:E8:E8` <dbl>, `AGRN:E25:E25` <dbl>,
## `AGRN:E29:E29` <dbl>, `AGRN:E36:E36` <dbl>, `TTLL10:E13:E13` <dbl>,
## `SDF4:E4:E4` <dbl>, `SDF4:E2:E2` <dbl>, `UBE2J2:E7:E7` <dbl>,
## `SCNN1D:E1:E1` <dbl>, `SCNN1D:E6:E6` <dbl>, `SCNN1D:E7:E7` <dbl>,
## `SCNN1D:E11:E11` <dbl>, `ACAP3:E22:E22` <dbl>
示例数据
df <- read.table(text =
'"ID" "subregion" "mutation" "attribute"
"1" "TCGA-AN-A046" "OR4F5:E1:E1" 69767 1.00849961637455
"2" "TCGA-A2-A0CP" "SAMD11:E2:E2" 925952 1.00346517231111
"3" "TCGA-A8-A08H" "NOC2L:E5:E5" 956126 1.43669428919156
"4" "TCGA-GM-A2DM" "NOC2L:E4:E4" 956911 1.1679575001733
"5" "TCGA-GM-A2DM" "NOC2L:E4:E4" 956912 1.1679575001733
"6" "TCGA-D8-A1XM" "KLHL17:E3:E3" 961658 2.17848956802821
"7" "TCGA-BH-A18G" "KLHL17:E5:E5" 962441 48.0640560165975
"8" "TCGA-3C-AALI" "KLHL17:E8:E8" 963353 40.6525553849528
"9" "TCGA-AC-A62Y" "KLHL17:E9:E9" 964004 2.89875813313313
"10" "TCGA-AR-A2LE" "PLEKHN1:E1:E1" 966556 1.03540263019699
"11" "TCGA-E2-A14N" "PLEKHN1:E5:E5" 970728 21.8246585021196
"12" "TCGA-AO-A0J4" "PLEKHN1:E12:E12" 973506 1.24409284966302
"13" "TCGA-D8-A1J9" "HES4:E3:E3" 999551 1.24409284966302
"14" "TCGA-EW-A1PH" "ISG15:E2:E2" 1014276 72.4814235432147
"15" "TCGA-A2-A0T0" "AGRN:E2:E2" 1022338 21.8246585021196
"16" "TCGA-GM-A2DD" "AGRN:E3:E3" 1035303 1.06314569745364
"17" "TCGA-5L-AAT1" "AGRN:E4:E4" 1040690 1.24409284966302
"18" "TCGA-OL-A5RW" "AGRN:E8:E8" 1043314 2.20878819659627
"19" "TCGA-D8-A27M" "AGRN:E25:E25" 1049355 1.45844645372491
"20" "TCGA-AR-A1AI" "AGRN:E29:E29" 1050430 1.16479379564338
"21" "TCGA-5L-AAT0" "AGRN:E36:E36" 1055374 7.09932582548073
"22" "TCGA-5L-AAT0" "AGRN:E36:E36" 1055376 7.09932582548073
"23" "TCGA-C8-A8HP" "AGRN:E36:E36" 1055442 7.09932582548073
"24" "TCGA-A7-A4SD" "TTLL10:E13:E13" 1184971 1.24409284966302
"25" "TCGA-BH-A1F0" "SDF4:E4:E4" 1223283 1.46091816304331
"26" "TCGA-AO-A128" "SDF4:E4:E4" 1223330 1.46091816304331
"27" "TCGA-E9-A1R0" "SDF4:E2:E2" 1228592 3.86565576505924
"28" "TCGA-A2-A04P" "UBE2J2:E7:E7" 1255246 33.795587162655
"29" "TCGA-C8-A274" "UBE2J2:E7:E7" 1255342 33.795587162655
"30" "TCGA-5L-AAT1" "SCNN1D:E1:E1" 1281422 1.24409284966302
"31" "TCGA-AO-A128" "SCNN1D:E6:E6" 1287116 1.06314569745364
"32" "TCGA-E2-A15R" "SCNN1D:E7:E7" 1287596 2.89179279138711
"33" "TCGA-AC-A62V" "SCNN1D:E11:E11" 1290543 74.0747402078337
"34" "TCGA-BH-A18V" "ACAP3:E22:E22" 1294187 2.21398621447599', header = T)