如何像对待文本文件一样对待标准输入

How to treat stdin like a text file

我有一个程序可以读取解析文本文件并对其进行一些分析。我想修改它,以便它可以通过命令行获取参数。指定stdin时从文件读取。

解析器如下所示:

class FastAreader :
    '''
    Class to provide reading of a file containing one or more FASTA
    formatted sequences:
    object instantiation:
    FastAreader(<file name>):

    object attributes:
    fname: the initial file name

    methods:
    readFasta() : returns header and sequence as strings.
    Author: David Bernick
    Date: April 19, 2013
    '''
    def __init__ (self, fname):
        '''contructor: saves attribute fname '''
        self.fname = fname

    def readFasta (self):
        '''
        using filename given in init, returns each included FastA record
        as 2 strings - header and sequence.
        whitespace is removed, no adjustment is made to sequence contents.
        The initial '>' is removed from the header.
        '''
        header = ''
        sequence = ''

        with open(self.fname) as fileH:
            # initialize return containers
            header = ''
            sequence = ''

            # skip to first fasta header
            line = fileH.readline()
            while not line.startswith('>') :
                line = fileH.readline()
            header = line[1:].rstrip()

            # header is saved, get the rest of the sequence
            # up until the next header is found
            # then yield the results and wait for the next call.
            # next call will resume at the yield point
            # which is where we have the next header
            for line in fileH:
                if line.startswith ('>'):
                    yield header,sequence
                    header = line[1:].rstrip()
                    sequence = ''
                else :
                    sequence += ''.join(line.rstrip().split()).upper()
        # final header and sequence will be seen with an end of file
        # with clause will terminate, so we do the final yield of the data
        yield header,sequence

# presumed object instantiation and example usage
# myReader = FastAreader ('testTiny.fa');
# for head, seq in myReader.readFasta() :
#     print (head,seq)

它解析如下所示的文件:

>test
ATGAAATAG
>test2
AATGATGTAA
>test3
AAATGATGTAA

>test-1
TTA CAT CAT

>test-2
TTA CAT CAT A

>test-3
TTA CAT CAT AA

>test1A
ATGATGTAAA
>test2A
AATGATGTAAA
>test3A
AAATGATGTAAA

>test-1A
A TTA CAT CAT

>test-2A
AA TTA CAT CAT A

>test-3A
AA TTA CAT CAT AA

我的测试程序是这样的:

import argparse
import sequenceAnalysis as s
import sys

class Test:
    def __init__(self, infile, longest, min, start):
        self.longest = longest
        self.start = set(start)
        self.infile = infile
        self.data = sys.stdin.read()
        self.fasta = s.FastAreader(self.data)
        for head, seq in self.fasta.readFasta():
            self.head = head
            self.seq = "".join(seq).strip()
        self.test()

    def test(self):
        print("YUP", self.start, self.head)


def main():
    parser = argparse.ArgumentParser(description = 'Program prolog', 
                                     epilog = 'Program epilog', 
                                     add_help = True, #default is True 
                                     prefix_chars = '-', 
                                     usage = '%(prog)s [options] -option1[default] <input >output')
    parser.add_argument('-i', '--inFile', action = 'store', help='input file name')
    parser.add_argument('-o', '--outFile', action = 'store', help='output file name') 
    parser.add_argument('-lG', '--longestGene', action = 'store', nargs='?', const=True, default=True, help='longest Gene in an ORF')
    parser.add_argument('-mG', '--minGene', type=int, choices= range(0, 2000), action = 'store', help='minimum Gene length')
    parser.add_argument('-s', '--start', action = 'append', nargs='?', help='start Codon') #allows multiple list options
    parser.add_argument('-v', '--version', action='version', version='%(prog)s 0.1')  
    args = parser.parse_args()
    test = Test(args.inFile, args.longestGene, args.minGene, args.start)


if __name__ == '__main__':
    main()

我的命令行输入如下所示:

python testcommand2.py -s ATG <tass2.fa >out.txt

其中tass2.fa是可以被FastAreader解析的文件。我可以传递像 start 这样的参数并让它们输出到文本文件,但是当我尝试解析应该是 stdin 的输入文件时,它会打印所有内容而不是解析它,而不是输出到应该是 stdout 的指定文本文件,而是打印它直接进入命令行。

当您使用 I/O 重定向时(即您在命令行中有 <|><<),即被处理shell 甚至在你的程序运行之前。所以当 Python 运行时,它的标准输入连接到你重定向的文件或管道,它的标准输出连接到你重定向到的文件或管道,并且文件名不是(直接)可见的到 Python 因为您正在处理已经 open()ed 的文件句柄,而不是文件名。你的参数解析器只是 returns 什么都没有,因为没有文件名参数。

要正确处理这个问题,您应该调整您的代码以直接使用文件句柄——而不是显式文件名,或者除了显式文件名之外。

对于后一种情况,一个常见的约定是文件名有一个特例 - 并且当它被传入时,使用标准输入(或标准输出,取决于上下文)而不是打开一份文件。 (您仍然可以通过使用相对路径 ./- 的简单解决方法来命名文件,因此名称不完全是一个破折号。)