从函数输出到 text/CSV 文件?

Output from function to text/CSV file?

我正在计算某组总统演讲中的缩略语数量,并希望将这些缩略语输出到 CSV 或文本文件中。这是我的代码:

import urllib2,sys,os,csv
from bs4 import BeautifulSoup,NavigableString
from string import punctuation as p
from multiprocessing import Pool
import re, nltk
import requests
import math, functools
import summarize
reload(sys)

def processURL_short(l):
    open_url = urllib2.urlopen(l).read()
    item_soup = BeautifulSoup(open_url)
    item_div = item_soup.find('div',{'id':'transcript'},{'class':'displaytext'})
    item_str = item_div.text.lower()
    return item_str

every_link_test = ['http://www.millercenter.org/president/obama/speeches/speech-4427',
'http://www.millercenter.org/president/obama/speeches/speech-4424',
'http://www.millercenter.org/president/obama/speeches/speech-4453',
'http://www.millercenter.org/president/obama/speeches/speech-4612',
'http://www.millercenter.org/president/obama/speeches/speech-5502']

data = {}
count = 0
for l in every_link_test:
    content_1 = processURL_short(l)
    for word in content_1.split():
        word = word.strip(p)
        if word in contractions:
            count = count + 1
        splitlink = l.split("/")
        president = splitlink[4]
        speech_num = splitlink[-1]
        filename = "{0}_{1}".format(president,speech_num)
    data[filename] = count
    print count, filename

   with open('contraction_counts.csv','w',newline='') as fp:
        a = csv.writer(fp,delimiter = ',')
        a.writerows(data)

运行 for 循环打印出

79 obama_speech-4427 101 obama_speech-4424 101 obama_speech-4453 182 obama_speech-4612 224 obama_speech-5502

我想将其导出到一个文本文件,其中左边的数字是一列,president/speech 数字在第二列。我的 with 语句只是将每一行写入一个单独的文件,这绝对是次优的。

你可以这样试试,这个是通用的方法,自己看情况修改

import csv
with open('somepath/file.txt', 'wb+') as outfile:
  w = csv.writer(outfile)
  w.writerow(['header1', 'header2'])
  for i in you_data_structure: # eg list or dictionary i'm assuming a list structure
    w.writerow([
      i[0],
      i[1],
    ])

或者如果是字典

import csv
with open('somepath/file.txt', 'wb+') as outfile:
  w = csv.writer(outfile)
  w.writerow(['header1', 'header2'])
  for k, v in your_dictionary.items(): # eg list or dictionary i'm assuming a list structure
    w.writerow([
      k,
      v,
    ])

您的问题是您以 w 模式在循环内打开输出文件,这意味着它在每次迭代时都会被删除。您可以通过两种方式轻松解决:

  1. 模式 open 在循环外(正常方式)。您将只打开文件一次,在每次迭代时添加一行并在退出 with 块时将其关闭:

    with open('contraction_counts.csv','w',newline='') as fp:
        a = csv.writer(fp,delimiter = ',')
        for l in every_link_test:
            content_1 = processURL_short(l)
            for word in content_1.split():
                word = word.strip(p)
                if word in contractions:
                    count = count + 1
                splitlink = l.split("/")
                president = splitlink[4]
                speech_num = splitlink[-1]
                filename = "{0}_{1}".format(president,speech_num)
            data[filename] = count
            print count, filename
            a.writerows(data)
    
  2. a(追加)模式打开文件。在每次迭代中,您重新打开文件并在最后写入而不是擦除它 - 由于 open/close,这种方式使用更多的 IO 资源,并且应该仅在程序可能中断并且您希望确保所有在崩溃之前写的实际上已经保存到磁盘

    for l in every_link_test:
        content_1 = processURL_short(l)
        for word in content_1.split():
            word = word.strip(p)
            if word in contractions:
                count = count + 1
            splitlink = l.split("/")
            president = splitlink[4]
            speech_num = splitlink[-1]
            filename = "{0}_{1}".format(president,speech_num)
        data[filename] = count
        print count, filename
    
        with open('contraction_counts.csv','a',newline='') as fp:
            a = csv.writer(fp,delimiter = ',')
            a.writerows(data)