NLP NER 处理错误
NLP NER Processing Errors
这是 tsv 文件。 c2is2r3.tsv
The O
fate O
of O
Lehman ORGANIZATION
Brothers ORGANIZATION
. . .
New ORGANIZATION
York ORGANIZATION
Fed ORGANIZATION
, O
and O
Treasury TITLE
Secretary TITLE
Henry PERSON
M. PERSON
Paulson PERSON
Jr. PERSON
. O
更多c2is2r3.prop
trainFile = c2is2r3.tsv
serializeTo = c2is2r3-ner-model.ser.gz
map = word=0,answer=1
useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true
这是原始序列
java -cp stanford-ner-3.5.2.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop c2is2r3.prop
java -cp stanford-ner-3.5.2.jar -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model c2is2r3-ner-model.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz -ner.useSUTime false -ner.combinationMode HIGH_RECALL -serializeTo c2is2.serialized.ncc.ncc.ser.gz
java -cp stanford-ner-3.5.2.jar -mx1g edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -textFile c2is2r3.txt
CRFClassifier invoked on Fri Jul 17 09:51:13 EDT 2015 with arguments:
-loadClassifier c2is2.serialized.ncc.ncc.ser.gz -textFile c2is2r3.txt
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
textFile=c2is2r3.txt
Loading classifier from /mnt/hgfs/share/nlp/stanford-ner-2015-04-20/c2is2.serialized.ncc.ncc.ser.gz ... Error deserializing /mnt/hgfs/share/nlp/stanford-ner-2015-04-20/c2is2.serialized.ncc.ncc.ser.gz
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassCastException: java.util.Properties cannot be cast to [Ledu.stanford.nlp.util.Index;
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1572)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1523)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2987)
Caused by: java.lang.ClassCastException: java.util.Properties cannot be cast to [Ledu.stanford.nlp.util.Index;
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2613)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1451)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1558)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1569)
... 2 more
这是尝试使用 NERClassifierCombiner
java -cp stanford-ner-3.5.2.jar -mx1g edu.stanford.nlp.ie.NERClassifierCombiner -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -testFile c2is2r3.txt
这是错误堆栈:
NERClassifierCombiner invoked on Fri Jul 17 10:11:17 EDT 2015 with arguments:
-loadClassifier c2is2.serialized.ncc.ncc.ser.gz -testFile c2is2r3.txt
testFile=c2is2r3.txt
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
testFile=c2is2r3.txt
ner.useSUTime=false
ner.model=c2is2r3-ner-model.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz
serializeTo=c2is2.serialized.ncc.ncc.ser.gz
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
ner.combinationMode=HIGH_RECALL
loading CRF...
loading CRF...
Error on line 1: The fate of Lehman Brothers, the beleaguered investment bank, hung in the balance on Sunday as Federal Reserve officials and the leaders of major financial institutions continued to gather in emergency meetings trying to complete a plan to rescue the stricken bank. Several possible plans emerged from the talks, held at the Federal Reserve Bank of New York and led by Timothy R. Geithner, the president of the New York Fed, and Treasury Secretary Henry M. Paulson Jr.
Exception in thread "main" java.lang.UnsupportedOperationException: Argument array lengths differ: [word, tag, answer] vs. [The, fate, of, Lehman, Brothers,, the, beleaguered, investment, bank,, hung, in, the, balance, on, Sunday, as, Federal, Reserve, officials, and, the, leaders, of, major, financial, institutions, continued, to, gather, in, emergency, meetings, trying, to, complete, a, plan, to, rescue, the, stricken, bank., Several, possible, plans, emerged, from, the, talks,, held, at, the, Federal, Reserve, Bank, of, New, York, and, led, by, Timothy, R., Geithner,, the, president, of, the, New, York, Fed,, and, Treasury, Secretary, Henry, M., Paulson, Jr.]
at edu.stanford.nlp.ling.CoreLabel.initFromStrings(CoreLabel.java:153)
at edu.stanford.nlp.ling.CoreLabel.<init>(CoreLabel.java:133)
at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter$ColumnDocParser.apply(ColumnDocumentReaderAndWriter.java:85)
at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter$ColumnDocParser.apply(ColumnDocumentReaderAndWriter.java:60)
at edu.stanford.nlp.objectbank.DelimitRegExIterator.parseString(DelimitRegExIterator.java:67)
at edu.stanford.nlp.objectbank.DelimitRegExIterator.setNext(DelimitRegExIterator.java:60)
at edu.stanford.nlp.objectbank.DelimitRegExIterator.<init>(DelimitRegExIterator.java:54)
at edu.stanford.nlp.objectbank.DelimitRegExIterator$DelimitRegExIteratorFactory.getIterator(DelimitRegExIterator.java:122)
at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter.getIterator(ColumnDocumentReaderAndWriter.java:54)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.setNextObject(ObjectBank.java:436)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:415)
at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:253)
at edu.stanford.nlp.sequences.ObjectBankWrapper.iterator(ObjectBankWrapper.java:52)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1160)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1111)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1071)
at edu.stanford.nlp.ie.NERClassifierCombiner.main(NERClassifierCombiner.java:382)
所以不确定下一步该做什么。任何其他组合。
在序列化步骤中,您使用以下序列化:
edu.stanford.nlp.ie.NERClassifierCombiner
在加载步骤中,您正在加载:
edu.stanford.nlp.ie.crf.CRFClassifier
所以在第二个命令中,改用 edu.stanford.nlp.ie.NERClassifierCombiner 错误应该会消失。您序列化了一个 NERClassifierCombiner,但正试图将其加载为 CRFClassifier。如果您还有其他问题,请告诉我!
第二个文件c2is2r3.txt需要先转换成tsv文件,然后再传递给你的命令。
您可以将 O(如果您不确定或想节省手动标记时间)关联到所有生成的标记,然后使用您的模型进行测试。
这是 tsv 文件。 c2is2r3.tsv
The O
fate O
of O
Lehman ORGANIZATION
Brothers ORGANIZATION
. . .
New ORGANIZATION
York ORGANIZATION
Fed ORGANIZATION
, O
and O
Treasury TITLE
Secretary TITLE
Henry PERSON
M. PERSON
Paulson PERSON
Jr. PERSON
. O
更多c2is2r3.prop
trainFile = c2is2r3.tsv
serializeTo = c2is2r3-ner-model.ser.gz
map = word=0,answer=1
useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true
这是原始序列
java -cp stanford-ner-3.5.2.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop c2is2r3.prop
java -cp stanford-ner-3.5.2.jar -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model c2is2r3-ner-model.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz -ner.useSUTime false -ner.combinationMode HIGH_RECALL -serializeTo c2is2.serialized.ncc.ncc.ser.gz
java -cp stanford-ner-3.5.2.jar -mx1g edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -textFile c2is2r3.txt
CRFClassifier invoked on Fri Jul 17 09:51:13 EDT 2015 with arguments:
-loadClassifier c2is2.serialized.ncc.ncc.ser.gz -textFile c2is2r3.txt
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
textFile=c2is2r3.txt
Loading classifier from /mnt/hgfs/share/nlp/stanford-ner-2015-04-20/c2is2.serialized.ncc.ncc.ser.gz ... Error deserializing /mnt/hgfs/share/nlp/stanford-ner-2015-04-20/c2is2.serialized.ncc.ncc.ser.gz
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassCastException: java.util.Properties cannot be cast to [Ledu.stanford.nlp.util.Index;
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1572)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1523)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2987)
Caused by: java.lang.ClassCastException: java.util.Properties cannot be cast to [Ledu.stanford.nlp.util.Index;
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2613)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1451)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1558)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1569)
... 2 more
这是尝试使用 NERClassifierCombiner
java -cp stanford-ner-3.5.2.jar -mx1g edu.stanford.nlp.ie.NERClassifierCombiner -loadClassifier c2is2.serialized.ncc.ncc.ser.gz -testFile c2is2r3.txt
这是错误堆栈:
NERClassifierCombiner invoked on Fri Jul 17 10:11:17 EDT 2015 with arguments:
-loadClassifier c2is2.serialized.ncc.ncc.ser.gz -testFile c2is2r3.txt
testFile=c2is2r3.txt
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
testFile=c2is2r3.txt
ner.useSUTime=false
ner.model=c2is2r3-ner-model.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz
serializeTo=c2is2.serialized.ncc.ncc.ser.gz
loadClassifier=c2is2.serialized.ncc.ncc.ser.gz
ner.combinationMode=HIGH_RECALL
loading CRF...
loading CRF...
Error on line 1: The fate of Lehman Brothers, the beleaguered investment bank, hung in the balance on Sunday as Federal Reserve officials and the leaders of major financial institutions continued to gather in emergency meetings trying to complete a plan to rescue the stricken bank. Several possible plans emerged from the talks, held at the Federal Reserve Bank of New York and led by Timothy R. Geithner, the president of the New York Fed, and Treasury Secretary Henry M. Paulson Jr.
Exception in thread "main" java.lang.UnsupportedOperationException: Argument array lengths differ: [word, tag, answer] vs. [The, fate, of, Lehman, Brothers,, the, beleaguered, investment, bank,, hung, in, the, balance, on, Sunday, as, Federal, Reserve, officials, and, the, leaders, of, major, financial, institutions, continued, to, gather, in, emergency, meetings, trying, to, complete, a, plan, to, rescue, the, stricken, bank., Several, possible, plans, emerged, from, the, talks,, held, at, the, Federal, Reserve, Bank, of, New, York, and, led, by, Timothy, R., Geithner,, the, president, of, the, New, York, Fed,, and, Treasury, Secretary, Henry, M., Paulson, Jr.]
at edu.stanford.nlp.ling.CoreLabel.initFromStrings(CoreLabel.java:153)
at edu.stanford.nlp.ling.CoreLabel.<init>(CoreLabel.java:133)
at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter$ColumnDocParser.apply(ColumnDocumentReaderAndWriter.java:85)
at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter$ColumnDocParser.apply(ColumnDocumentReaderAndWriter.java:60)
at edu.stanford.nlp.objectbank.DelimitRegExIterator.parseString(DelimitRegExIterator.java:67)
at edu.stanford.nlp.objectbank.DelimitRegExIterator.setNext(DelimitRegExIterator.java:60)
at edu.stanford.nlp.objectbank.DelimitRegExIterator.<init>(DelimitRegExIterator.java:54)
at edu.stanford.nlp.objectbank.DelimitRegExIterator$DelimitRegExIteratorFactory.getIterator(DelimitRegExIterator.java:122)
at edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter.getIterator(ColumnDocumentReaderAndWriter.java:54)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.setNextObject(ObjectBank.java:436)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:415)
at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:253)
at edu.stanford.nlp.sequences.ObjectBankWrapper.iterator(ObjectBankWrapper.java:52)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1160)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1111)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1071)
at edu.stanford.nlp.ie.NERClassifierCombiner.main(NERClassifierCombiner.java:382)
所以不确定下一步该做什么。任何其他组合。
在序列化步骤中,您使用以下序列化:
edu.stanford.nlp.ie.NERClassifierCombiner
在加载步骤中,您正在加载:
edu.stanford.nlp.ie.crf.CRFClassifier
所以在第二个命令中,改用 edu.stanford.nlp.ie.NERClassifierCombiner 错误应该会消失。您序列化了一个 NERClassifierCombiner,但正试图将其加载为 CRFClassifier。如果您还有其他问题,请告诉我!
第二个文件c2is2r3.txt需要先转换成tsv文件,然后再传递给你的命令。
您可以将 O(如果您不确定或想节省手动标记时间)关联到所有生成的标记,然后使用您的模型进行测试。