我的矢量化项目不工作。我将如何解决这个问题

My vectorize project isn't working. How would I go about fixing this

class Document: 
    def __init__(self, doc_id):
        # create a new document with its ID
        self.id = doc_id
        # create an empty dictionary 
        # that will hold the term frequency (TF) counts
        self.tfs = {}

   def tokenization(self, text):
        # split a title into words, 
        # using space " " as delimiter
        words = text.lower().split(" ")
        for word in words: 
            # for each word in the list
            if word in self.tfs: 
                # if it has been counted in the TF dictionary
                # add 1 to the count
                self.tfs[word] = self.tfs[word] + 1
            else:
                # if it has not been counted, 
                # initialize its TF with 1
                self.tfs[word] = 1


    def save_dictionary(diction_data, file_path_name):
        f = open("./textfiles", "w+")

        for key in diction_data:
            # Separate the key from the frequency with a space and
            # add a newline to the end of each key value pair
            f.write(key + " " + str(diction_data[key]) + "\n")

        f.close()


    def vectorize(data_path):
        Documents = []
        for i in range(1, 21):
            file_name = "./textfiles/"+ i + ".txt"
            # create a new document with an ID
        doc = Document(i+1)
            #Read the files
        f = open(file_name)
        print(f.read())
            # compute the term frequencies
            #read in the files contents
        doc.tokenization(f.read())
            # add the documents to the lists
        Documents.append(doc)

     save_dictionary(doc.tfs, "tf_" + str(doc.id) + ".txt")

     DFS = {}
     for doc in Documents:
        for word in doc.tfs:
        DFS[word] = DFS.get(word,0) + 1

    save_dictionary(doc.DFS, "DFS_" + str(doc.id) + ".txt")


    vectorize("./textfiles")

以上是我的代码,但无法正常工作。我为文档字典中的每个单词添加了一个嵌套循环以执行以下操作:如果它没有出现在 DF 的字典中,则将该单词添加到 DF 字典;

如果它已经在DF字典中,通过给自身加1来增加它的DF值;

然后在处理完所有文件后,我再次调用 save_dictionary() 函数将 DF 字典保存到一个名为 df.txt 的文件中,该文件与输入文本文件位于同一路径中。然后向量化。

当我 运行 代码没有任何反应所以我肯定在某处做错了任何帮助将不胜感激。

如评论中所述,您在多个地方的缩进是错误的。请先解决这个问题。例如,在 vectorize() 中,idoc 赋值中被引用,但它在定义 i 的 for 循环的局部范围之外。

此外,将您的逻辑代码与脚本部分分开,以便于调试会很有帮助。

更新:

save_dictionaryvectorize() 要么需要 self 作为第一个函数参数成为 Document class 的一部分,要么需要 @staticmethod 装饰器。此外,i 仍然在 for 循环之外引用,它仅适用。对 vectorize 的更改我建议修复关于 for 循环的缩进并使用 with 指定创建上下文管理器以正确且轻松地打开和关闭文件:

def vectorize(self, data_path):
    Documents = []
    for i in range(1, 21):
        file_name = "./textfiles/"+ str(i) + ".txt"
        # create a new document with an ID
        doc = Document(i+1)
        #Read the files
        with open(file_name, 'r') as f:
            text = f.read()
            # compute the term frequencies
            #read in the files contents
            doc.tokenization(text)
            # add the documents to the lists
        Documents.append(doc)

上下文管理器在退出 with context/indentation 时自动关闭文件。