无法在 Azure Databricks 提供的 spark 集群中导入已安装的 python 模块
Can't import installed python modules in spark cluster offered by Azure Databricks
我刚刚开始通过 Azure Databricks 中提供的 spark 集群 运行 python 笔记本。根据要求,我们通过 shell 命令以及 databricks 工作区中的 'Create library' UI 安装了几个外部包,如 spacy 和 kafka。
python -m spacy download en_core_web_sm
但是,每次我们 运行 'import ' 时,集群都会抛出 'Module not found' 错误。
OSError: Can't find Model 'en_core_web_sm'
最重要的是,我们似乎无法确切知道这些模块的安装位置。尽管在 'sys.path' 中添加了模块路径,但问题仍然存在。
请尽快告诉我们如何解决此问题
您可以按照以下步骤在 Azure Databricks 上安装和加载 spaCy 包。
第 1 步: 使用 pip 安装 spaCy 并下载 spaCy 模型。
%sh
/databricks/python3/bin/pip install spacy
/databricks/python3/bin/python3 -m spacy download en_core_web_sm
笔记本输出:
Step2: 运行使用spaCy的例子。
import spacy
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
# Process whole documents
text = ("When Sebastian Thrun started working on self-driving cars at "
"Google in 2007, few people outside of the company took him "
"seriously. “I can tell you very senior CEOs of major American "
"car companies would shake my hand and turn away because I wasn’t "
"worth talking to,” said Thrun, in an interview with Recode earlier "
"this week.")
doc = nlp(text)
# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])
# Find named entities, phrases and concepts
for entity in doc.ents:
print(entity.text, entity.label_)
笔记本输出:
希望这对您有所帮助。如果您有任何疑问,请告诉我们。
请点击 "Mark as Answer" 并在对您有帮助的 post 上投票,这可能对其他社区成员有益。
将 spacy "en_core_web_sm" 模型安装为
%sh python -m spacy download en_core_web_sm
将模型导入为
import en_core_web_sm
nlp = en_core_web_sm.load()
doc = nlp("My name is Raghu Ram. I live in Kolkata.")
for ent in doc.ents:
print(ent.text, ent.label_)
创建集群时使用 Databricks ML 运行时分布https://docs.databricks.com/runtime/mlruntime.html
然后您可以从 Install Library UI 安装 spacy(只需转到 cluster/libraries 并照常安装),或通过 %sh、%pip 或 %conda
然后加载英文语料库:
%python
导入spacy
spacy.cli.download("en_core_web_lg")
我刚刚开始通过 Azure Databricks 中提供的 spark 集群 运行 python 笔记本。根据要求,我们通过 shell 命令以及 databricks 工作区中的 'Create library' UI 安装了几个外部包,如 spacy 和 kafka。
python -m spacy download en_core_web_sm
但是,每次我们 运行 'import ' 时,集群都会抛出 'Module not found' 错误。
OSError: Can't find Model 'en_core_web_sm'
最重要的是,我们似乎无法确切知道这些模块的安装位置。尽管在 'sys.path' 中添加了模块路径,但问题仍然存在。
请尽快告诉我们如何解决此问题
您可以按照以下步骤在 Azure Databricks 上安装和加载 spaCy 包。
第 1 步: 使用 pip 安装 spaCy 并下载 spaCy 模型。
%sh
/databricks/python3/bin/pip install spacy
/databricks/python3/bin/python3 -m spacy download en_core_web_sm
笔记本输出:
Step2: 运行使用spaCy的例子。
import spacy
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
# Process whole documents
text = ("When Sebastian Thrun started working on self-driving cars at "
"Google in 2007, few people outside of the company took him "
"seriously. “I can tell you very senior CEOs of major American "
"car companies would shake my hand and turn away because I wasn’t "
"worth talking to,” said Thrun, in an interview with Recode earlier "
"this week.")
doc = nlp(text)
# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])
# Find named entities, phrases and concepts
for entity in doc.ents:
print(entity.text, entity.label_)
笔记本输出:
希望这对您有所帮助。如果您有任何疑问,请告诉我们。
请点击 "Mark as Answer" 并在对您有帮助的 post 上投票,这可能对其他社区成员有益。
将 spacy "en_core_web_sm" 模型安装为
%sh python -m spacy download en_core_web_sm
将模型导入为
import en_core_web_sm
nlp = en_core_web_sm.load()
doc = nlp("My name is Raghu Ram. I live in Kolkata.")
for ent in doc.ents:
print(ent.text, ent.label_)
创建集群时使用 Databricks ML 运行时分布https://docs.databricks.com/runtime/mlruntime.html
然后您可以从 Install Library UI 安装 spacy(只需转到 cluster/libraries 并照常安装),或通过 %sh、%pip 或 %conda
然后加载英文语料库:
%python
导入spacy spacy.cli.download("en_core_web_lg")