100 个训练示例是否足以使用 spacy 训练自定义 NER?
Is 100 training examples sufficient for training custom NER using spacy?
我已经为姓名数据训练了 NER 模型。我生成了一些包含人名的随机句子。我生成了大约 70 个句子并以 spacy 的格式注释了数据。
我使用空白 'en' 模型和 'en_core_web_sm' 训练了自定义 NER,但是当我在任何字符串上进行测试时。它能够在很少的例子中检测到。
这个例子数量不够吗?
My data looks like this -:
[("'Hi, I am looking for a house on rent for a year. Best Regards, Rajesh',\r",
{'entities': [(56, 63, 'name')]}),
("'Hello everyone, I am Gunjan Arora',\r", {'entities': [(22, 34, 'name')]}),
("'Greetings!, I am 34 years old. I want a car for my wife Bella Roy',\r",
{'entities': [(60, 69, 'name')]}),
("'Heyo, I lived with my family comprises 4 people and myself Randy Lao',\r",
{'entities': [(60, 69, 'name')]}),
("'I am Geetanjali. ',\r", {'entities': [(6, 16, 'name')]})]
I have generated some 70 examples like this.
Losses during training -:
- 1.Losses {'ner': 6.307317615201415}
- 2.Losses {'ner': 11.182436657139132}
- 3.Losses {'ner': 6.014345924849759}
- 4.Losses {'ner': 6.442589285506237}
- 5.Losses {'ner': 5.328383899880891}
- 6.Losses {'ner': 1.706726450400089}
- 7.Losses {'ner': 3.9960324752880005}
- 8.Losses {'ner': 5.415169572852782}
These losses when I am using blank 'en' model
请推荐。
我想检测姓名,因为预训练模型本身在大多数情况下也无法检测姓名。
为了获得更好的结果,您需要生成更多示例,70 个示例不足以训练您的模型,尽管它可能适用于 non-sophisticated 问题。
我建议将生成的示例增加三倍以获得合适的结果
我已经为姓名数据训练了 NER 模型。我生成了一些包含人名的随机句子。我生成了大约 70 个句子并以 spacy 的格式注释了数据。
我使用空白 'en' 模型和 'en_core_web_sm' 训练了自定义 NER,但是当我在任何字符串上进行测试时。它能够在很少的例子中检测到。
这个例子数量不够吗?
My data looks like this -:
[("'Hi, I am looking for a house on rent for a year. Best Regards, Rajesh',\r",
{'entities': [(56, 63, 'name')]}),
("'Hello everyone, I am Gunjan Arora',\r", {'entities': [(22, 34, 'name')]}),
("'Greetings!, I am 34 years old. I want a car for my wife Bella Roy',\r",
{'entities': [(60, 69, 'name')]}),
("'Heyo, I lived with my family comprises 4 people and myself Randy Lao',\r",
{'entities': [(60, 69, 'name')]}),
("'I am Geetanjali. ',\r", {'entities': [(6, 16, 'name')]})]
I have generated some 70 examples like this.
Losses during training -:
- 1.Losses {'ner': 6.307317615201415}
- 2.Losses {'ner': 11.182436657139132}
- 3.Losses {'ner': 6.014345924849759}
- 4.Losses {'ner': 6.442589285506237}
- 5.Losses {'ner': 5.328383899880891}
- 6.Losses {'ner': 1.706726450400089}
- 7.Losses {'ner': 3.9960324752880005}
- 8.Losses {'ner': 5.415169572852782}
These losses when I am using blank 'en' model
请推荐。
我想检测姓名,因为预训练模型本身在大多数情况下也无法检测姓名。
为了获得更好的结果,您需要生成更多示例,70 个示例不足以训练您的模型,尽管它可能适用于 non-sophisticated 问题。 我建议将生成的示例增加三倍以获得合适的结果