NLP NER 处理错误

NLP NER Processing Errors

这是 tsv 文件。 c2is2r3.tsv

The O
fate    O
of  O
Lehman  ORGANIZATION
Brothers    ORGANIZATION

. . .

New ORGANIZATION
York    ORGANIZATION
Fed ORGANIZATION
,   O
and O
Treasury    TITLE
Secretary   TITLE
Henry   PERSON
M.  PERSON
Paulson PERSON
Jr. PERSON
.   O

更多c2is2r3.prop

trainFile = c2is2r3.tsv
serializeTo = c2is2r3-ner-model.ser.gz
map = word=0,answer=1

useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true

这是原始序列

java -cp  stanford-ner-3.5.2.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop c2is2r3.prop


java -cp stanford-ner-3.5.2.jar -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model c2is2r3-ner-model.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz -ner.useSUTime false -ner.combinationMode HIGH_RECALL -serializeTo c2is2.serialized.ncc.ncc.ser.gz


java -cp stanford-ner-3.5.2.jar -mx1g edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -textFile c2is2r3.txt


CRFClassifier invoked on Fri Jul 17 09:51:13 EDT 2015 with arguments:
   -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -textFile c2is2r3.txt
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
textFile=c2is2r3.txt
Loading classifier from /mnt/hgfs/share/nlp/stanford-ner-2015-04-20/c2is2.serialized.ncc.ncc.ser.gz ... Error deserializing /mnt/hgfs/share/nlp/stanford-ner-2015-04-20/c2is2.serialized.ncc.ncc.ser.gz
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassCastException: java.util.Properties cannot be cast to [Ledu.stanford.nlp.util.Index;
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1572)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1523)
    at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2987)
Caused by: java.lang.ClassCastException: java.util.Properties cannot be cast to [Ledu.stanford.nlp.util.Index;
    at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2613)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1451)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1558)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1569)
    ... 2 more

这是尝试使用 NERClassifierCombiner

java -cp stanford-ner-3.5.2.jar -mx1g edu.stanford.nlp.ie.NERClassifierCombiner  -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -testFile c2is2r3.txt

这是错误堆栈:

NERClassifierCombiner invoked on Fri Jul 17 10:11:17 EDT 2015 with arguments:
   -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -testFile c2is2r3.txt
testFile=c2is2r3.txt
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
testFile=c2is2r3.txt
ner.useSUTime=false
ner.model=c2is2r3-ner-model.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz
serializeTo=c2is2.serialized.ncc.ncc.ser.gz
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
ner.combinationMode=HIGH_RECALL
loading CRF...
loading CRF...
Error on line 1: The fate of Lehman Brothers, the beleaguered investment bank, hung in the balance on Sunday as Federal Reserve officials and the leaders of major financial institutions continued to gather in emergency meetings trying to complete a plan to rescue the stricken bank.  Several possible plans emerged from the talks, held at the Federal Reserve Bank of New York and led by Timothy R. Geithner, the president of the New York Fed, and Treasury Secretary Henry M. Paulson Jr.
Exception in thread "main" java.lang.UnsupportedOperationException: Argument array lengths differ: [word, tag, answer] vs. [The, fate, of, Lehman, Brothers,, the, beleaguered, investment, bank,, hung, in, the, balance, on, Sunday, as, Federal, Reserve, officials, and, the, leaders, of, major, financial, institutions, continued, to, gather, in, emergency, meetings, trying, to, complete, a, plan, to, rescue, the, stricken, bank., Several, possible, plans, emerged, from, the, talks,, held, at, the, Federal, Reserve, Bank, of, New, York, and, led, by, Timothy, R., Geithner,, the, president, of, the, New, York, Fed,, and, Treasury, Secretary, Henry, M., Paulson, Jr.]
    at edu.stanford.nlp.ling.CoreLabel.initFromStrings(CoreLabel.java:153)
    at edu.stanford.nlp.ling.CoreLabel.<init>(CoreLabel.java:133)
    at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter$ColumnDocParser.apply(ColumnDocumentReaderAndWriter.java:85)
    at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter$ColumnDocParser.apply(ColumnDocumentReaderAndWriter.java:60)
    at edu.stanford.nlp.objectbank.DelimitRegExIterator.parseString(DelimitRegExIterator.java:67)
    at edu.stanford.nlp.objectbank.DelimitRegExIterator.setNext(DelimitRegExIterator.java:60)
    at edu.stanford.nlp.objectbank.DelimitRegExIterator.<init>(DelimitRegExIterator.java:54)
    at edu.stanford.nlp.objectbank.DelimitRegExIterator$DelimitRegExIteratorFactory.getIterator(DelimitRegExIterator.java:122)
    at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter.getIterator(ColumnDocumentReaderAndWriter.java:54)
    at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.setNextObject(ObjectBank.java:436)
    at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:415)
    at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:253)
    at edu.stanford.nlp.sequences.ObjectBankWrapper.iterator(ObjectBankWrapper.java:52)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1160)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1111)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1071)
    at edu.stanford.nlp.ie.NERClassifierCombiner.main(NERClassifierCombiner.java:382)

所以不确定下一步该做什么。任何其他组合。

在序列化步骤中,您使用以下序列化:

edu.stanford.nlp.ie.NERClassifierCombiner

在加载步骤中,您正在加载:

edu.stanford.nlp.ie.crf.CRFClassifier

所以在第二个命令中,改用 edu.stanford.nlp.ie.NERClassifierCombiner 错误应该会消失。您序列化了一个 NERClassifierCombiner,但正试图将其加载为 CRFClassifier。如果您还有其他问题,请告诉我!

第二个文件c2is2r3.txt需要先转换成tsv文件,然后再传递给你的命令。

您可以将 O(如果您不确定或想节省手动标记时间)关联到所有生成的标记,然后使用您的模型进行测试。