提取 displacy (spacy) 输出依赖关系
Extracting displacy (spacy) output depenedency connections
我正在使用 spacy 的位移可视化工具来查看句子中单词之间的依赖关系。它看起来像这样:
text = 'European authorities fined Google a record .1 billion on Wednesday for abusing its power in the mobile phone market and ordered the company to alter its practices
print(displacy.render(nlp(text), jupyter=True, style='ent'))
print(displacy.render(nlp(text), style='dep', jupyter = True, options = {'distance': 120}))
有没有办法通过索引字符串中的单词来提取箭头所建立的联系?例如,在下图中,查看 'European Authorities fined Google' 中的连接。无论如何要制作以下数据框(单词列中的每个单词,以及连接列中单词连接到的每个单词)?:
word | connection
---------------------------
European |
Authorities| European
fined | Authorities, Google, record, ..., ...
Google |
空间 provides a lot of attributes that you can use for this purpose like ancestors or children。请注意,这些属性 return 生成器因此需要将它们转换为列表,然后是字符串
这是我使用 children 属性的示例
text = 'European authorities fined Google a record .1 billion on Wednesday for abusing its power in the mobile phone market and ordered the company to alter its practices'
doc = nlp(text)
words = []
a_network = []
for w in doc:
words.append(w)
network = [t.text for t in list(w.children)]
a_network.append(", ".join(network))
df = pd.DataFrame({"word":words,"network":a_network})
print(df)
输出将是
word network
0 European
1 authorities European
2 fined authorities, Google, record, on, for
3 Google
4 a
5 record a, billion
6 $
7 5.1
8 billion $, 5.1
...
我正在使用 spacy 的位移可视化工具来查看句子中单词之间的依赖关系。它看起来像这样:
text = 'European authorities fined Google a record .1 billion on Wednesday for abusing its power in the mobile phone market and ordered the company to alter its practices
print(displacy.render(nlp(text), jupyter=True, style='ent'))
print(displacy.render(nlp(text), style='dep', jupyter = True, options = {'distance': 120}))
有没有办法通过索引字符串中的单词来提取箭头所建立的联系?例如,在下图中,查看 'European Authorities fined Google' 中的连接。无论如何要制作以下数据框(单词列中的每个单词,以及连接列中单词连接到的每个单词)?:
word | connection
---------------------------
European |
Authorities| European
fined | Authorities, Google, record, ..., ...
Google |
空间 provides a lot of attributes that you can use for this purpose like ancestors or children。请注意,这些属性 return 生成器因此需要将它们转换为列表,然后是字符串
这是我使用 children 属性的示例
text = 'European authorities fined Google a record .1 billion on Wednesday for abusing its power in the mobile phone market and ordered the company to alter its practices'
doc = nlp(text)
words = []
a_network = []
for w in doc:
words.append(w)
network = [t.text for t in list(w.children)]
a_network.append(", ".join(network))
df = pd.DataFrame({"word":words,"network":a_network})
print(df)
输出将是
word network
0 European
1 authorities European
2 fined authorities, Google, record, on, for
3 Google
4 a
5 record a, billion
6 $
7 5.1
8 billion $, 5.1
...