使用 spacy 时如何解决属性错误？

Question

我正在使用 spacy 进行德语自然语言处理。但是我运行遇到了这个错误：

AttributeError: 'str' object has no attribute 'text'

这是我正在处理的文本数据：

tex = ['Wir waren z.B. früher auf\'m Fahrrad unterwegs in München (immer nach 11 Uhr).',
        'Nun fahren wir öfter mit der S-Bahn in München herum. Tja. So ist das eben.',
        'So bleibt mir nichts anderes übrig als zu sagen, vielen Dank für alles.',
        'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.']

我的代码：

data = [re.sub(r"\"", "", i) for i in tex]
data1 = [re.sub(r"\“", "", i) for i in data]
data2 = [re.sub(r"\„", "", i) for i in data1]

nlp = spacy.load('de')
spacy_doc1 = []
for line in data2:
    spac = nlp(line)
    lem = [tok.lemma_ for tok in spac]
    no_punct = [tok.text for tok in lem if re.match('\w+', tok.text)]
    no_numbers = [tok for tok in no_punct if not re.match('\d+', tok)]

我将每个字符串写在一个单独的列表中，因为我需要将处理结果分配给原始的特定字符串。

我也了解到写入 lem 的结果不再是 spacy 可以处理的格式。

那么我怎样才能正确地做到这一点？

Answer 1

这里的问题在于 SpaCy 的 token.lemma_ returns 一个字符串，并且该字符串没有 text 属性（如错误状态）。

我建议像您写的那样做：

no_numbers = [tok for tok in no_punct if not re.match('\d+', tok)]

代码中这一行的唯一区别是您必须包含特殊字符串 "-PRON-" 以防遇到英语代词：

import re
import spacy

# using the web English model for practicality here
nlp = spacy.load('en_core_web_sm')

tex = ['I\'m going to get a cat tomorrow',
        'I don\'t know if I\'ll be able to get him a cat house though!']

data = [re.sub(r"\"", "", i) for i in tex]
data1 = [re.sub(r"\“", "", i) for i in data]
data2 = [re.sub(r"\„", "", i) for i in data1]

spacy_doc1 = []

for line in data2:
    spac = nlp(line)
    lem = [tok.lemma_ for tok in spac]
    no_punct = [tok for tok in lem if re.match('\w+', tok) or tok in ["-PRON-"]]
    no_numbers = [tok for tok in no_punct if not re.match('\d+', tok)]
    print(no_numbers)

# > ['-PRON-', 'be', 'go', 'to', 'get', 'a', 'cat', 'tomorrow']
# > ['-PRON-', 'do', 'not', 'know', 'if', '-PRON-', 'will', 'be', 'able', 'to', 'get', '-PRON-', 'a', 'cat', 'house', 'though']

请告诉我这是否解决了您的问题，因为我可能误解了您的问题。

使用 spacy 时如何解决属性错误？

How can I solve an attribute error when using spacy?

python

spacy