以一种方式编辑大型输入流的第一行，并以不同方式编辑所有其他行的最有效方法？

Question

问题... (N=2*10^7)

从这里开始：

colName1 colName2 colName3 ... colNameN
1        x        x        ... x 
2        x        x        ... x
1        y        x        ... x
2        y        x        ... x  
...      ...      ...      ... ...
1        xx       xx       ... xx
2        xx       xx       ... xx

对此：

Sample colName1 colName2 colName3 ... colNameN
A       1        x        x       ... x 
A       2        x        x       ... x
B       1        y        x       ... x
B       2        y        x       ... x  
...     ...      ...      ...     ... ...
N       1        xx       xx      ... xx
N       2        xx       xx      ... xx

问题：我需要将 "Sample" 添加到第一个 "header" 行，并将相应的示例名称添加到之后的每一行。样本名称将存储在 object.

中

混淆问题：

数据来自输入流；目前通过 subprocess.PIPE
文件有 2000 万行很常见，所以每次检查 firstLine 标志会很昂贵吗？

我想知道是否有办法只对输入流中的第一行输入做一些事情。

或者...

尝试所有行相同会不会更容易，这意味着我们将示例名称添加到 header 行。然后，我们将文件中的第一个单词从样本名称编辑为 "Sample\t"

这种方法的成本如何？目前，我有一个 firstLine 标志，如下所示。

fileSTREAM = subprocess.Popen(callString, stdout=subprocess.PIPE, shell=True)

# To indicate the first line of the steam, which happens to be the column-headers.
firstLine = True

# Foreach to add a word to the front of each line of input.
for line in fileSTREAM.stdout:

    # Decode the input from btye literals to strings.
    currLine = line.decode("utf-8")

    # First line is different, we want to add SAMPLE, instead of the actual sample name.
    if firstLine == True:
        outputTARGET.write("SAMPLE \t%s" % currLine)
        firstLine = False

    # All other lines we want to add the sample name, instead of the word SAMPLE.
    else:
        outputTARGET.write(str(wildcards.samples) + "\t%s" % currLine)

可能不是 python 特定问题，但我正在寻找 python 特定解决方案。

Answer 1

大声喊叫@Prune，谢谢你:)

最好的方法是读取输入流的第一行。 Python 有很好的内置函数来处理这个问题。

最后用这个：

# Call the function and capture its output to modify each line.
fileSTREAM = subprocess.Popen(callString, stdout=subprocess.PIPE, shell=True)

# Initially read and edit just the first, adding 'SAMPLE' to header line.
outputTARGET.write("SAMPLE \t%s" % fileSTREAM.stdout.readline().decode("utf-8"))

# Add the sampleName to each line after the header line.
for line in fileSTREAM.stdout:
    # Decode the input from btye literals to strings
    outputTARGET.write(str(wildcards.samples) + "\t%s" % line.decode("utf-8"))

以一种方式编辑大型输入流的第一行，并以不同方式编辑所有其他行的最有效方法？

Most efficient way to edit the first line of a large input stream one manner, and all other lines a different manner?

python

processing-efficiency