如何在 keras 中为 NER 系统定义预测函数?

How to define a prediction function in keras for NER system?

我正在按照一些使用 Keras 的教程创建一个 NER 系统。在训练和第一次预测之后,我想用它来识别单个字符串或未见数据字符串列表中的 NE。

我似乎找不到将此类字符串或字符串列表传递给 model.predict() 并获得适当预测的方法。

这是我代码中测试数据的预测,所以我试图调整它以接受看不见的数据字符串并打印令牌 + 预测:

i = np.random.randint(0, x_test.shape[0])
print("This is sentence:",i)
p = model.predict(np.array([x_test[i]]))
p = np.argmax(p, axis=-1)

print("{:15}{:5}\t {}\n".format("Word", "True", "Pred"))
print("-" *30)
for w, true, pred in zip(x_test[i], y_test[i], p[0]):
    print("{:15}{}\t{}".format(words[w-1], tags[true], tags[pred]))

这段代码预测并打印了带有NE Tag的每个token,但是我不太明白它是如何工作的

此代码打印如下内容:

Word           True      Pred
------------------------------
The            O        O
British        B-gpe    B-gpe
pharmaceutical O        O
company        O        O
GlaxoSmithKlineB-org    O

我想通过例如:

sentence = "President Obama became the first sitting American president to visit Hiroshima"

并且能够看到已识别的 NE。关于如何做到这一点有什么建议吗?

完整代码的副本是 here and the dataset is used is here

您可以对这样的句子列表进行预测:

my_sentences = ["President Obama became the first sitting American president to visit Hiroshima",
                "Jack is a good person and living in Iran"]

my_sentences_idx = [[word2idx[w] for w in s.split(" ")] for s in my_sentences]

my_sentences_padded = pad_sequences(maxlen=max_len, sequences=my_sentences_idx, padding="post", value=num_words-1)
preds = np.argmax(model.predict(np.array(my_sentences_padded)), axis=-1)

for idx, p in enumerate(preds):
    print("-" *30)
    print(my_sentences[idx])
    print("-" *30)
    for w, pred in zip(my_sentences[idx].split(" "), preds[idx]):
        if tags[pred]!="O":
            print("{:15} {} ".format(w, tags[pred]))
    print()

输出:

------------------------------
President Obama became the first sitting American president to visit Hiroshima
------------------------------
President       B-per 
Obama           I-per 
American        B-gpe 
Hiroshima       B-geo 

------------------------------
Jack is a good person and living in Iran
------------------------------
Jack            B-per 
Iran            B-geo