Python 只为 CSV 文件写入 1 行

Question

我很抱歉再次提出这个问题，但是，它仍然没有得到解决。

这不是一个非常复杂的问题，我确信它相当简单，但我就是看不出问题所在。

我用于解析 XML 文件的代码已打开并以我想要的格式读取 - 最后的 for 循环中的 print 语句证明了这一点。

例如它输出这个：

Pivoting support handle D0584129 20090106 US

Hinge D0584130 20090106 US

Deadbolt turnpiece D0584131 20090106 US

这正是我希望将数据写入 CSV 文件的方式。但是，当我尝试将这些作为行实际写入 CSV 本身时，它只打印 XML 文件中的最后一行之一，并以这种方式：

Flashlight package,D0584138,20090106,US

这是我的全部代码，因为它可能有助于理解整个过程，其中感兴趣的区域是 separated_xml 中的 for xml_string 开始的地方:

from bs4 import BeautifulSoup import csv import unicodecsv as csv infile = "C:\Users\Grisha\Documents\Inventor\2009_Data\Jan\ipg090106.xml" # The first line of code defines a function "separated_xml" that will allow us to separate, read, and then finally parse the data of interest with def separated_xml(infile): # Defining the data reading function for each xml section - This breaks apart the xml from the start (root element <?xml...) to the next iteration of the root element file = open(infile, "r") # Used to open the xml file buffer = [file.readline()] # Used to read each line and placing inside vector # The first for-loop is used to slice every section of the USPTO XML file to be read and parsed individually # It is necessary because Python wishes to read only one instance of a root element but this element is found many times in each file which causes reading errors for line in file: # Running for-loop for the opened file and searches for root elements if line.startswith("<?xml "): yield "".join(buffer) # 1) Using "yield" allows to generate one instance per run of a root element and 2) .join takes the list (vector) "buffer" and connects an empty string to it buffer = [] # Creates a blank list to store the beginning of a new 'set' of data in beginning with the root element buffer.append(line) # Passes lines into list yield "".join(buffer) # Outputs file.close() # The second nested set of for-loops are used to parse the newly reformatted data into a new list for xml_string in separated_xml(infile): # Calls the output of the separated and read file to parse the data soup = BeautifulSoup(xml_string, "lxml") # BeautifulSoup parses the data strings where the XML is converted to Unicode pub_ref = soup.findAll("publication-reference") # Beginning parsing at every instance of a publication lst = [] # Creating empty list to append into with open('./output.csv', 'wb') as f: writer = csv.writer(f, dialect = 'excel') for info in pub_ref: # Looping over all instances of publication # The final loop finds every instance of invention name, patent number, date, and country to print and append into for inv_name, pat_num, date_num, country in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), soup.findAll("date"), soup.findAll("country")): print(inv_name.text, pat_num.text, date_num.text, country.text) lst.append((inv_name.text, pat_num.text, date_num.text, country.text)) writer.writerow([inv_name.text, pat_num.text, date_num.text, country.text])

我也试过将open和writer放在for循环之外来检查问题出在哪里，但无济于事。我知道该文件一次只写 1 行并一遍又一遍地覆盖同一行（这就是 CSV 文件中只保留 1 行的原因），我就是看不到它。

非常感谢您的提前帮助。

Answer 1

我相信（无论如何第一个工作理论）你的问题的基础是你的 with open 语句落在你的 for 循环中，并且使用 "wb" 的模式覆盖文件，如果它已经存在。这意味着每次您的 for 循环运行时，它都会覆盖之前存在的所有内容，并且在完成后只留下一行输出。

我认为您可以通过两种方式处理此问题。更正确的方法是将文件打开语句移到最外层的 for 循环之外。我知道你提到你已经尝试过这个，但细节决定成败。这将使您的更新代码看起来像这样：

    with open('./output.csv', 'wb') as f:
      writer = csv.writer(f, dialect='excel')

      for xml_string in separated_xml(infile):
        soup = BeautifulSoup(xml_string, "lxml")
        pub_ref = soup.findAll("publication-reference")
        lst = []

        for info in pub_ref:

          for inv_name, pat_num, date_num, country in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), soup.findAll("date"), soup.findAll("country")):
            print(inv_name.text, pat_num.text, date_num.text, country.text)
            lst.append((inv_name.text, pat_num.text, date_num.text, country.text))
            writer.writerow([inv_name.text, pat_num.text, date_num.text, country.text])

一种笨拙但更快更简单的方法是简单地将 open 调用中的模式更改为 "ab" （追加，二进制）而不是 "wb" （写入二进制，它会覆盖任何现有的数据）。这效率要低得多，因为您每次通过 for 循环仍然重新打开文件，但它可能会起作用。

希望对您有所帮助！

Answer 2

with open('./output.csv', 'wb') as f:

只需要更改 'wb' -> 'ab' 即可不覆盖。

第一次没用，但在最后 2 个循环之前移动打开函数解决了这个问题。感谢帮助过的人。

Python 只为 CSV 文件写入 1 行

Python writing only 1 line for CSV file

python

csv

excel

export-to-csv