Stanford CoreNLP Morphology.stemStatic 禁用小写转换？

Question

关于 Morphology 的 stemStatic 方法的评论 class 指出它将：

return a new WordTag which has the lemma as the value of word().
The default is to lowercase non-proper-nouns, unless options have been set.

(https://github.com/evandrix/stanford-corenlp/blob/master/src/edu/stanford/nlp/process/Morphology.java)

How/where 我可以设置这些选项来禁用小写转换吗？

我查看了源代码，但看不到如何设置会影响此静态方法的选项。令人沮丧的是，相关的静态词形还原方法 -- lemmaStatic -- 包含一个布尔参数来完成这个...

我正在通过 Maven 使用 v3.3.1...

谢谢！

Answer 1

好的，稍微看一下后，似乎正确的轨道可能是不使用静态方法，而是构建一个形态学实例：

public Morphology(Reader in, int flags) {

int 标志将设置 lexer.options。

这是词法分析器选项（来自 Morpha.java）：

/** If this option is set, print the word affix after a + character */
private final static int print_affixes = 0;  
/** If this option is set, lowercase all tokens */
private final static int change_case = 1;
/** Return the tags on the input words if present?? */
private final static int tag_output= 2;

int flags 是 3 个选项的位串，所以 7 = 111 意味着所有选项都将设置为 true ，0 = 000 ，所有选项为 false，5 = 101 将设置 print_affixes 和tag_output,等等...

然后就可以在Morphology.java

中使用apply了

public Object apply(Object in) {

输入的对象应该是用原始词和标签构建的WordTag。

如果您需要任何进一步的帮助，请告诉我！

我们也可以改变Morphology.java来拥有你想要的那种方法！以上是如果你不想玩自定义 Stanford CoreNLP。

Stanford CoreNLP Morphology.stemStatic 禁用小写转换？

Stanford CoreNLP Morphology.stemStatic disable lowercase conversion?

stemming

stanford-nlp