使用 Stanford NER 输出 "file--token--entity"

Output "file--token--entity" using Stanford NER

我想在 C# 中使用 Stanford NER 读取一个文件夹中的所有文件并将结果输出到一个文件中,格式为 "file token entity"

这是我的:

namespace stanfordNER
{
    class Program
    {
        public static CRFClassifier Classifier = CRFClassifier.getClassifierNoExceptions(@"english.all.3class.distsim.crf.ser.gz");

        static void Main(string[] args)
        {
            Console.WriteLine("directory address?");
            string dir = Console.ReadLine();

            //Reads all files in directory
            string[] files = System.IO.Directory.GetFiles(dir);
            foreach (string f in files)
            {
                //Get the document name
                string docNo = Path.GetFileName(Path.GetFullPath(f).TrimEnd(Path.DirectorySeparatorChar));
                Console.WriteLine(docNo);

                string docText = System.IO.File.ReadAllText(f); 

                var classified = Classifier.classifyFile(f).toArray();

                //Error here when running
                //Should output the entities,**this part is the work of Stewart Whiting (STEWH)
                for (int i = 0; i < classified.Length; i++)
                {
                    Triple triple = (Triple)classified[i];

                    int second = Convert.ToInt32(triple.second().ToString());
                    int third = Convert.ToInt32(triple.third().ToString());

                    Console.WriteLine(docNo + '\t' + triple.first().ToString() + '\t' +                              docText.Substring(second, third - second));
                }
            }
        }
    }
}

我在 "triple" 收到无效的转换异常错误。不懂三重功能怎么用

我想要的输出示例:

wiki-ms      ORGANIZATION    Microsoft Corporation
wiki-ms      LOCATION        Redmond
wiki-ms      LOCATION        Washington
wiki-ms      ORGANIZATION    Microsoft
wiki-ms      ORGANIZATION    Microsoft Office
wiki-ms      ORGANIZATION    Microsoft
wiki-ms      PERSON          Bill Gates
wiki-ms      PERSON          Paul Allen
wiki-ms      ORGANIZATION    Microsoft
wiki-ms      ORGANIZATION    Microsoft

提前致谢!我是一名制造工程师,所以我的编程知识很差。

如果您有办法过滤重复的 and/or 个相似的实体,那将是一个额外的好处!

感谢 Stewart Whiting。 His Site

我想出来了,只需要更改

var classified = Classifier.classifyFile(f).toArray();

var classified = Classifier.classifyToCharacterOffsets(docText).toArray();

谢谢。