使用 Stanford NER 输出 "file--token--entity"
Output "file--token--entity" using Stanford NER
我想在 C# 中使用 Stanford NER 读取一个文件夹中的所有文件并将结果输出到一个文件中,格式为 "file token entity"
这是我的:
namespace stanfordNER
{
class Program
{
public static CRFClassifier Classifier = CRFClassifier.getClassifierNoExceptions(@"english.all.3class.distsim.crf.ser.gz");
static void Main(string[] args)
{
Console.WriteLine("directory address?");
string dir = Console.ReadLine();
//Reads all files in directory
string[] files = System.IO.Directory.GetFiles(dir);
foreach (string f in files)
{
//Get the document name
string docNo = Path.GetFileName(Path.GetFullPath(f).TrimEnd(Path.DirectorySeparatorChar));
Console.WriteLine(docNo);
string docText = System.IO.File.ReadAllText(f);
var classified = Classifier.classifyFile(f).toArray();
//Error here when running
//Should output the entities,**this part is the work of Stewart Whiting (STEWH)
for (int i = 0; i < classified.Length; i++)
{
Triple triple = (Triple)classified[i];
int second = Convert.ToInt32(triple.second().ToString());
int third = Convert.ToInt32(triple.third().ToString());
Console.WriteLine(docNo + '\t' + triple.first().ToString() + '\t' + docText.Substring(second, third - second));
}
}
}
}
}
我在 "triple" 收到无效的转换异常错误。不懂三重功能怎么用
我想要的输出示例:
wiki-ms ORGANIZATION Microsoft Corporation
wiki-ms LOCATION Redmond
wiki-ms LOCATION Washington
wiki-ms ORGANIZATION Microsoft
wiki-ms ORGANIZATION Microsoft Office
wiki-ms ORGANIZATION Microsoft
wiki-ms PERSON Bill Gates
wiki-ms PERSON Paul Allen
wiki-ms ORGANIZATION Microsoft
wiki-ms ORGANIZATION Microsoft
提前致谢!我是一名制造工程师,所以我的编程知识很差。
如果您有办法过滤重复的 and/or 个相似的实体,那将是一个额外的好处!
感谢 Stewart Whiting。 His Site
我想出来了,只需要更改
var classified = Classifier.classifyFile(f).toArray();
至
var classified = Classifier.classifyToCharacterOffsets(docText).toArray();
谢谢。
我想在 C# 中使用 Stanford NER 读取一个文件夹中的所有文件并将结果输出到一个文件中,格式为 "file token entity"
这是我的:
namespace stanfordNER
{
class Program
{
public static CRFClassifier Classifier = CRFClassifier.getClassifierNoExceptions(@"english.all.3class.distsim.crf.ser.gz");
static void Main(string[] args)
{
Console.WriteLine("directory address?");
string dir = Console.ReadLine();
//Reads all files in directory
string[] files = System.IO.Directory.GetFiles(dir);
foreach (string f in files)
{
//Get the document name
string docNo = Path.GetFileName(Path.GetFullPath(f).TrimEnd(Path.DirectorySeparatorChar));
Console.WriteLine(docNo);
string docText = System.IO.File.ReadAllText(f);
var classified = Classifier.classifyFile(f).toArray();
//Error here when running
//Should output the entities,**this part is the work of Stewart Whiting (STEWH)
for (int i = 0; i < classified.Length; i++)
{
Triple triple = (Triple)classified[i];
int second = Convert.ToInt32(triple.second().ToString());
int third = Convert.ToInt32(triple.third().ToString());
Console.WriteLine(docNo + '\t' + triple.first().ToString() + '\t' + docText.Substring(second, third - second));
}
}
}
}
}
我在 "triple" 收到无效的转换异常错误。不懂三重功能怎么用
我想要的输出示例:
wiki-ms ORGANIZATION Microsoft Corporation
wiki-ms LOCATION Redmond
wiki-ms LOCATION Washington
wiki-ms ORGANIZATION Microsoft
wiki-ms ORGANIZATION Microsoft Office
wiki-ms ORGANIZATION Microsoft
wiki-ms PERSON Bill Gates
wiki-ms PERSON Paul Allen
wiki-ms ORGANIZATION Microsoft
wiki-ms ORGANIZATION Microsoft
提前致谢!我是一名制造工程师,所以我的编程知识很差。
如果您有办法过滤重复的 and/or 个相似的实体,那将是一个额外的好处!
感谢 Stewart Whiting。 His Site
我想出来了,只需要更改
var classified = Classifier.classifyFile(f).toArray();
至
var classified = Classifier.classifyToCharacterOffsets(docText).toArray();
谢谢。