f.readline 与 f.read 打印输出

Question

我是 Python 的新手（使用 Python 3.6）。我有一个包含公司信息的 read.txt 文件。该文件以不同的报告特征开头

CONFORMED PERIOD REPORT:             20120928 #this is 1 line
DATE OF REPORT:                      20121128 #this is another line

and then starts all the text about the firm..... #lots of lines here

我正在尝试提取两个日期 (['20120928','20121128']) 以及文本中的一些字符串（即如果字符串存在，那么我想要一个“1”）。最终，我想要一个向量给我两个日期 + 不同字符串的 1 和 0，也就是说，类似于：['20120928','20121128','1','0']。我的代码如下：

exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = f.read()  # read the txt file
    for line in f:
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)

如果我运行此代码，我将获得 ['1','0']，省略日期并正确读取文件，var1 存在（确定为“1”）而 var2 不存在（好的'0'）。我不明白的是为什么它不报告日期。重要的是，当我将 line2 更改为 "line2=f.readline()" 时，我将获得 ['20120928','20121128','0','0']。现在确定日期，但我知道 var1 存在，它似乎没有读取文件的其余部分？如果我省略 "line2=f.read()"，它会为每一行吐出一个 0 向量，除了我想要的输出。我怎样才能省略这些 0？

我想要的输出是：['20120928','20121128','1','0']

抱歉打扰了。不管怎样，谢谢你！

Answer 1

line2 = f.read() 将整个文件读入 line2，因此您的 for line in f: 循环没有任何内容可读。

Answer 2

行f.read()会将整个文件读入变量line2。如果你想逐行阅读，你可以跳过 f.read() 并像这样迭代

with open('read.txt', 'r') as f:
    for line in f:

否则，如所写，在您 .read() 进入 line2 后，没有更多文本可读出 f，因为它全部包含在 line2 变量中。

Answer 3

我最终经历的方式如下：

exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = "" # create an empty string variable out of the "for line" loop
    for line in f:
        line2 = line2 + line #append each line to the above created empty string
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)

到目前为止，这就是我得到的。它对我有用，尽管我猜想使用 beautifulsoup 会提高代码的效率。下一步:)

f.readline 与 f.read 打印输出

f.readline versus f.read print output

python

parsing

readfile