Select 特定列仅在 Python 中形成数据框

Question

使用 python 和 pandas 作为 pd，我试图输出一个包含基于特定 headers.

列子集的文件

这是一个输入文件的例子

gene_input = pd.read_table(args.gene, sep="\t" ,index_col=0)

gene_input的结构：

       Sample1  Sample2  Sample3  Sample4  Sample5  Sample6  Sample7  Sample8
Gene1        2       23      213      213       13      132      213     4312
Gene2        3       12    21312      123      123       23     4321      432
Gene3        5      213    21312       15      516     3421     4312     4132
Gene4        2      123      123        7      610       23     3214     4312
Gene5        1      213      213        1      152       23     1423     3421

使用不同的循环，我生成了两个词典。第一个有键（示例 1 和示例 7），第二个有键（示例 4 和 8）。

我想要以下输出（请注意，我希望每个词典中的样本都是连续的；即首先是所有词典 1，然后是所有词典 2）：我正在寻找的输出是：

        Sample1 Sample7 Sample4 Sample8
Gene1   2   213 213 4312
Gene2   3   4321    123 432
Gene3   5   4312    15  4132
Gene4   2   3214    7   4312
Gene5   1   1423    1   3421

我尝试了以下但 none 有效：

key_num=list(dictionary1.keys())
num = genes_input[gene_input.columns.isin(key_num)]

为了提取第一组列然后以某种方式组合它，但是失败了。它一直给我属性错误，我确实更新了 pandas。我还尝试了以下方法：

reader = csv.reader( open(gene_input, 'rU'), delimiter='\t')
header_row = reader.next() # Gets the header

for key, value in numerator.items():
    output.write(key + "\t")
    if key in header_row:
        for row in reader:
            idx=header_row.index(key)
            output.write(idx +"\t")

以及其他一些 commands/loops/lines。有时我只得到第一个键只出现在输出中，其他时候我得到一个错误；取决于我尝试的方法（为了方便起见，我没有在这里列出所有方法）。

无论如何，如果有人对我如何生成感兴趣的输出文件有任何意见，我将不胜感激。

同样，这是我想要的最终输出：

        Sample1 Sample7 Sample4 Sample8
Gene1   2   213 213 4312
Gene2   3   4321    123 432
Gene3   5   4312    15  4132
Gene4   2   3214    7   4312
Gene5   1   1423    1   3421

Answer 1

对于特定顺序的一组特定列，使用：
df = gene_input[['Sample1', 'Sample2', 'Sample4', 'Sample7']]

如果您需要自动生成该列表 (['Sample1',...])，并且名称是给定的，您应该能够构建两个列表，将它们合并然后排序：
column_names = sorted(dictionary1.keys() + dictionary2.keys())

您的姓名应该正确排序。对于输出，您应该能够使用：
df.to_csv(<output file name>, sep='\t')

编辑：添加了关于输出的部分

Select 特定列仅在 Python 中形成数据框

Select Specific Columns only form a dataframe in Python

python

header

extract

pandas