如何从零开始搭建人脸识别系统?

How to build a Face recognition system from scratch?

我正在为 人脸识别系统构建原型,在编写算法时,我有几个问题。

算法:

  1. 收集一对 (A(i),P(i),N(i)) - 在 XYX 公司工作的员工的锚点、正面、负面图像集。

  2. 使用梯度下降训练三元组损失函数来学习 CNN 参数。实际上,在这里我正在训练 Siamese network(运行 两个相同的 CNN 在 2 个不同输入上的想法[一次在 A(i)-P(i) 上,下一次在 A(i)-N( i)] 然后比较它们)。

    These learned parameters will ensure that the distance between the flattened n-dim encoding of the same images would be small and different image would be large.!

  3. 现在,创建一个数据库,您将在其中存储 XYX 公司员工的每张训练图像的编码!

    Simply make a forward pass through the trained CNN and store the corresponding encoding of each image in the database

  4. 测试的时候,你有XYX公司员工的形象和外人的形象!

    • You will pass both of the test images through the CNN and get the corresponding encodings!

    • Now, The question comes that how would you find the similarity between the test-picture-encoding and all the training-picture-encoding in the database?

      • First question, Would you do cosine similarity or I need to do something else? Can you add more clarity on it?

      • Second question, Also, in terms of efficiency, how would you handle a scenario wherein you have 100,000 employees training-picture-encoding in the database present and for every new person you need to look these 100,000 encodings and compute cosine similarity and give result in <2 secs? Any suggestion on this part?

    • 第三个问题如果我们使用approach(Image-->CNN-->SoftMax--> output),通常用于人脸识别任务,每次有新人加入你的组织你需要重新训练你的网络,这就是为什么这是一个糟糕的方法!
    • This problem can be mitigated by using the 2nd approach wherein we are using a learned distance function "d(img1, img2)" over a pair of images of employees as stated above on in point 1 to 3.

      • My question is in case of a new employee joining the organization, How this learned distance function would be able to generalize when it was not been used in the training set at all? Isn't a problem of changed data distribution of test and train set? Any suggestion in this regards

任何人都可以帮助理解这些概念上的错误吗?

在进行了一些关于人脸验证的文献调查和 recognition/detection 计算机视觉研究论文之后。我想我的所有问题都得到了答案,所以我想在这里回答。

第一个问题,你会做余弦相似度吗?你能更清楚地说明一下吗?

  • 通过简单地计算它们之间的欧几里得距离,找到测试和每个保存的火车图像 enc 之间的最小距离。

  • 不保持阈值,比如 0.7,并且最小距离 < 0.7 return 其他员工的姓名 "not in the database error!"

第二个问题,另外,在效率方面,你会如何处理一个场景,你有100,000名员工 training-picture-encoding in the database present and for every new person您需要查看这 100,000 个编码并计算余弦相似度并在 <2 秒内给出结果?

  • 需要注意的是,在训练过程中使用了 128 维浮点向量,但它可以量化为 128 字节而不损失精度。因此,每张人脸都由一个 128 维字节向量紧凑地表示,非常适合大规模聚类和识别。较小的嵌入可能会导致较小的准确性损失,并且可以在移动设备上使用

第三题: - 首先,我们正在通过最小化三元组损失函数来学习深度 CNN(Siamese n/w) 的网络参数!

  • 其次,假设您已经在数百万人的庞大数据集上训练了这些模型权重,这些权重已经学习了更高级别的特征,例如人的身份、性别等!以及低级特征,例如与人脸相关的边缘。
  • Now, there is an assumption that these model parameters together can represent any human face at least!, so you will go ahead and save the "new person" encoding in the database by making forward pass through your network and later, use answer 1 to compute whether the person belongs to organization or not(face recognition problem). Moreover, In the FaceNet paper it's mentioned that we keep a holdout set of around one million images, that has the same distribution as our training set, but disjoint identities.
  • 第三,这两种技术的不同之处在于我们在第一种技术中使用损失函数训练这些模型权重的方式:交叉熵 softmax 与第二种技术损失函数:三元组损失函数!