Python 向量化函数和调用 save_dictionary 问题
Python vectorize function and calling save_dictionary issue
我正在创建一个执行以下操作的矢量化函数。
将字符串参数作为文本数据文件所在的路径(文件夹);
处理路径下的所有数据文件,并产生TF和DF统计;
我修复了上次提交的代码,想知道如何调用 save_dictionary() 函数将带有 TF(术语频率)的文档字典保存到文件中,文件名应该在该文件中tf_DOCID.txt 在同一路径上。
class Document:
def __init__(self, doc_id):
# create a new document with its ID
self.id = doc_id
# create an empty dictionary
# that will hold the term frequency (TF) counts
self.tfs = {}
def tokenization(self, text):
# split a title into words,
# using space " " as delimiter
words = text.lower().split(" ")
for word in words:
# for each word in the list
if word in self.tfs:
# if it has been counted in the TF dictionary
# add 1 to the count
self.tfs[word] = self.tfs[word] + 1
else:
# if it has not been counted,
# initialize its TF with 1
self.tfs[word] = 1
def save_dictionary(diction_data, file_path_name):
# print the key-values pair in a dictionary
f = open("./textfiles", "w+")
for key in diction_data:
f.print(key, diction_data[key])
f.close()
def vectorize(data_path):
Document = []
for i in range(1, 21):
file_name = "./textfiles/"+ i + ".txt"
# create a new document with an ID
Document = Document(i+1)
#Read the files
f = open(Document)
print(f.read())
# compute the term frequencies
Document.tokenization(file_name)
# add the documents to the lists
Documents.append(Document)
检查向量化函数:
1) 文档未定义
2) 我假设您想创建一个空列表:document = []
我认为你在 python 方面差距不大。没有进入 class 实现,但这里有一些评论:
请注意,您在标记化中只传递了路径,但在使用它作为文件文本的方法中,您首先需要打开文件路径并读取其内容。
def vectorize(data_path):
documents = [] # No need to declare the type of the array
for i in range(1, 21):
file_name = "./textfiles/"+ i + ".txt"
# create a new document with an ID
doc= Document(i+1) # Initiation
# compute the term frequencies
doc.tokenization(file_name)
# add the documents to the lists
documents .append(doc) # appending a current document to documents array
哦,显然请将 class 名称更改为 Document as goes pep8 约定
你可以在这里看更多:
pep8
关于save_dictionary func:我会把它作为Document class的一个方法。
并使用 json 将其保存到文件中:
import json
def save_dictionary(self):
# print the key-values pair in a dictionary
with open(f'somepath/tf_{self.id}.txt', 'w') as f:
f.write(json.dumps(self.tfs))
我正在创建一个执行以下操作的矢量化函数。
将字符串参数作为文本数据文件所在的路径(文件夹); 处理路径下的所有数据文件,并产生TF和DF统计;
我修复了上次提交的代码,想知道如何调用 save_dictionary() 函数将带有 TF(术语频率)的文档字典保存到文件中,文件名应该在该文件中tf_DOCID.txt 在同一路径上。
class Document:
def __init__(self, doc_id):
# create a new document with its ID
self.id = doc_id
# create an empty dictionary
# that will hold the term frequency (TF) counts
self.tfs = {}
def tokenization(self, text):
# split a title into words,
# using space " " as delimiter
words = text.lower().split(" ")
for word in words:
# for each word in the list
if word in self.tfs:
# if it has been counted in the TF dictionary
# add 1 to the count
self.tfs[word] = self.tfs[word] + 1
else:
# if it has not been counted,
# initialize its TF with 1
self.tfs[word] = 1
def save_dictionary(diction_data, file_path_name):
# print the key-values pair in a dictionary
f = open("./textfiles", "w+")
for key in diction_data:
f.print(key, diction_data[key])
f.close()
def vectorize(data_path):
Document = []
for i in range(1, 21):
file_name = "./textfiles/"+ i + ".txt"
# create a new document with an ID
Document = Document(i+1)
#Read the files
f = open(Document)
print(f.read())
# compute the term frequencies
Document.tokenization(file_name)
# add the documents to the lists
Documents.append(Document)
检查向量化函数:
1) 文档未定义
2) 我假设您想创建一个空列表:document = []
我认为你在 python 方面差距不大。没有进入 class 实现,但这里有一些评论: 请注意,您在标记化中只传递了路径,但在使用它作为文件文本的方法中,您首先需要打开文件路径并读取其内容。
def vectorize(data_path):
documents = [] # No need to declare the type of the array
for i in range(1, 21):
file_name = "./textfiles/"+ i + ".txt"
# create a new document with an ID
doc= Document(i+1) # Initiation
# compute the term frequencies
doc.tokenization(file_name)
# add the documents to the lists
documents .append(doc) # appending a current document to documents array
哦,显然请将 class 名称更改为 Document as goes pep8 约定 你可以在这里看更多: pep8
关于save_dictionary func:我会把它作为Document class的一个方法。 并使用 json 将其保存到文件中:
import json
def save_dictionary(self):
# print the key-values pair in a dictionary
with open(f'somepath/tf_{self.id}.txt', 'w') as f:
f.write(json.dumps(self.tfs))