如何将字典值转换为 csv 文件?
How to convert dictionary values into a csv file?
我绝对是 Python 的初学者。我正在对希腊戏剧进行文本分析并计算每个词的词频。由于剧本很长,我无法看到我的全部数据,它只显示出现频率最低的单词,因为Pythonwindow中没有足够的space。我正在考虑将其转换为 .csv 文件。我的完整代码如下:
#read the file as one string and spit the string into a list of separate words
input = open('Aeschylus.txt', 'r')
text = input.read()
wordlist = text.split()
#read file containing stopwords and split the string into a list of separate words
stopwords = open("stopwords .txt", 'r').read().split()
#remove stopwords
wordsFiltered = []
for w in wordlist:
if w not in stopwords:
wordsFiltered.append(w)
#create dictionary by counting no of occurences of each word in list
wordfreq = [wordsFiltered.count(x) for x in wordsFiltered]
#create word-frequency pairs and create a dictionary
dictionary = dict(zip(wordsFiltered,wordfreq))
#sort by decreasing frequency and print
aux = [(dictionary[word], word) for word in dictionary]
aux.sort()
aux.reverse()
for y in aux: print y
import csv
with open('Aeschylus.csv', 'w') as csvfile:
fieldnames = ['dictionary[word]', 'word']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'dictionary[word]': '1', 'word': 'inherited'})
writer.writerow({'dictionary[word]': '1', 'word': 'inheritance'})
writer.writerow({'dictionary[word]': '1', 'word': 'inherit'})
我在互联网上找到了 csv 的代码。我希望得到的是从最高频率到最低频率的完整数据列表。使用我现在拥有的这段代码,python 似乎完全忽略了 csv 部分,只是打印数据,就好像我没有为 csv 编码一样。
知道我应该编写什么代码才能看到我的预期结果吗?
谢谢。
因为你有一本字典,其中单词是键,它们的频率是值,所以 DictWriter
不适合。它适用于共享一些公共键集的映射序列,用作 csv 的列。例如,如果您有一个您手动创建的字典列表:
a_list = [{'dictionary[word]': '1', 'word': 'inherited'},
{'dictionary[word]': '1', 'word': 'inheritance'},
{'dictionary[word]': '1', 'word': 'inherit'}]
那么 DictWriter
将是完成这项工作的工具。但是你有一个 dictionary
像:
dictionary = {'inherited': 1,
'inheritance': 1,
'inherit': 1,
...: ...}
但是,您已经构建了 (freq, word)
对的排序列表 aux
,这非常适合写入 csv:
with open('Aeschylus.csv', 'wb') as csvfile:
header = ['frequency', 'word']
writer = csv.writer(csvfile)
writer.writerow(header)
# Note the plural method name
writer.writerows(aux)
python seems to be totally ignoring the csv part and just printing the data as if I didn't code for the csv.
听起来很奇怪。至少你应该得到一个文件 Aeschylus.csv 包含:
dictionary[word],word
1,inherited
1,inheritance
1,inherit
您的频率统计方法还可以改进。目前
#create dictionary by counting no of occurences of each word in list
wordfreq = [wordsFiltered.count(x) for x in wordsFiltered]
必须为 wordsFiltered
中的每个单词遍历列表 wordsFiltered
,因此 O(n²)。您可以改为遍历文件中的单词、过滤和计数。 Python 有一个专门用于计算可散列对象的字典,称为 Counter
:
from __future__ import print_function
from collections import Counter
import csv
# Many ways to go about this, could for example yield from (<gen expr>)
def words(filelike):
for line in filelike:
for word in line.split():
yield word
def remove(iterable, stopwords):
stopwords = set(stopwords) # O(1) lookups instead of O(n)
for word in iterable:
if word not in stopwords:
yield word
if __name__ == '__main__':
with open("stopwords.txt") as f:
stopwords = f.read().split()
with open('Aeschylus.txt') as wordfile:
wordfreq = Counter(remove(words(wordfile), stopwords))
然后,和以前一样,打印单词及其频率,从最常见的开始:
for word, freq in wordfreq.most_common():
print(word, freq)
And/or 写成 csv:
# Since you're using python 2, 'wb' and no newline=''
with open('Aeschylus.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['word', 'freq'])
# If you want to keep most common order in CSV as well. Otherwise
# wordfreq.items() would do as well.
writer.writerows(wordfreq.most_common())
我绝对是 Python 的初学者。我正在对希腊戏剧进行文本分析并计算每个词的词频。由于剧本很长,我无法看到我的全部数据,它只显示出现频率最低的单词,因为Pythonwindow中没有足够的space。我正在考虑将其转换为 .csv 文件。我的完整代码如下:
#read the file as one string and spit the string into a list of separate words
input = open('Aeschylus.txt', 'r')
text = input.read()
wordlist = text.split()
#read file containing stopwords and split the string into a list of separate words
stopwords = open("stopwords .txt", 'r').read().split()
#remove stopwords
wordsFiltered = []
for w in wordlist:
if w not in stopwords:
wordsFiltered.append(w)
#create dictionary by counting no of occurences of each word in list
wordfreq = [wordsFiltered.count(x) for x in wordsFiltered]
#create word-frequency pairs and create a dictionary
dictionary = dict(zip(wordsFiltered,wordfreq))
#sort by decreasing frequency and print
aux = [(dictionary[word], word) for word in dictionary]
aux.sort()
aux.reverse()
for y in aux: print y
import csv
with open('Aeschylus.csv', 'w') as csvfile:
fieldnames = ['dictionary[word]', 'word']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'dictionary[word]': '1', 'word': 'inherited'})
writer.writerow({'dictionary[word]': '1', 'word': 'inheritance'})
writer.writerow({'dictionary[word]': '1', 'word': 'inherit'})
我在互联网上找到了 csv 的代码。我希望得到的是从最高频率到最低频率的完整数据列表。使用我现在拥有的这段代码,python 似乎完全忽略了 csv 部分,只是打印数据,就好像我没有为 csv 编码一样。
知道我应该编写什么代码才能看到我的预期结果吗?
谢谢。
因为你有一本字典,其中单词是键,它们的频率是值,所以 DictWriter
不适合。它适用于共享一些公共键集的映射序列,用作 csv 的列。例如,如果您有一个您手动创建的字典列表:
a_list = [{'dictionary[word]': '1', 'word': 'inherited'},
{'dictionary[word]': '1', 'word': 'inheritance'},
{'dictionary[word]': '1', 'word': 'inherit'}]
那么 DictWriter
将是完成这项工作的工具。但是你有一个 dictionary
像:
dictionary = {'inherited': 1,
'inheritance': 1,
'inherit': 1,
...: ...}
但是,您已经构建了 (freq, word)
对的排序列表 aux
,这非常适合写入 csv:
with open('Aeschylus.csv', 'wb') as csvfile:
header = ['frequency', 'word']
writer = csv.writer(csvfile)
writer.writerow(header)
# Note the plural method name
writer.writerows(aux)
python seems to be totally ignoring the csv part and just printing the data as if I didn't code for the csv.
听起来很奇怪。至少你应该得到一个文件 Aeschylus.csv 包含:
dictionary[word],word
1,inherited
1,inheritance
1,inherit
您的频率统计方法还可以改进。目前
#create dictionary by counting no of occurences of each word in list
wordfreq = [wordsFiltered.count(x) for x in wordsFiltered]
必须为 wordsFiltered
中的每个单词遍历列表 wordsFiltered
,因此 O(n²)。您可以改为遍历文件中的单词、过滤和计数。 Python 有一个专门用于计算可散列对象的字典,称为 Counter
:
from __future__ import print_function
from collections import Counter
import csv
# Many ways to go about this, could for example yield from (<gen expr>)
def words(filelike):
for line in filelike:
for word in line.split():
yield word
def remove(iterable, stopwords):
stopwords = set(stopwords) # O(1) lookups instead of O(n)
for word in iterable:
if word not in stopwords:
yield word
if __name__ == '__main__':
with open("stopwords.txt") as f:
stopwords = f.read().split()
with open('Aeschylus.txt') as wordfile:
wordfreq = Counter(remove(words(wordfile), stopwords))
然后,和以前一样,打印单词及其频率,从最常见的开始:
for word, freq in wordfreq.most_common():
print(word, freq)
And/or 写成 csv:
# Since you're using python 2, 'wb' and no newline=''
with open('Aeschylus.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['word', 'freq'])
# If you want to keep most common order in CSV as well. Otherwise
# wordfreq.items() would do as well.
writer.writerows(wordfreq.most_common())