有什么方法可以在调用服务器时将输入文件提供给 Stanza（stanford corenlp 客户端）而不是一段文本？

Question

我有一个包含 Imdb 情绪分析数据集的 .csv 文件。每个实例都是一个段落。我正在使用 Stanza https://stanfordnlp.github.io/stanza/client_usage.html 为每个实例获取解析树。

text = "Chris Manning is a nice person. Chris wrote a simple sentence. He also gives oranges to people."

with CoreNLPClient(
    annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'],
    timeout=30000,
    memory='16G') as client:
ann = client.annotate(text)

现在，我必须为每个实例重新运行服务器，因为我有 50k 个实例，这会花费很多时间。

1
Starting server with command: java -Xmx16G -cp /home/wahab/treeattention/stanford-corenlp- 
4.0.0/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 1200000 -threads 
5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-a74576b3341f4cac.props 
-preload parse
2
Starting server with command: java -Xmx16G -cp /home/wahab/treeattention/stanford-corenlp- 
4.0.0/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 1200000 -threads 
5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-d09e0e04e2534ae6.props 
-preload parse

有什么方法可以传递文件或进行批处理吗？

Answer 1

您应该只启动服务器一次。最简单的方法是加载 Python 中的文件，提取每个段落，然后提交这些段落。您应该将每个段落从您的 IMDB 传递给 annotate() 方法。服务器将处理句子拆分。

有什么方法可以在调用服务器时将输入文件提供给 Stanza（stanford corenlp 客户端）而不是一段文本？

Is there any way to give an input file to Stanza (stanford corenlp client) rather then one piece of text while calling server?

parsing

nlp

stanford-nlp