The authors propose a CNN module called Transductive Centroid Projection (TCP) for semi-supervised learning, which incorporate the training of the unlabelled clusters accompanied by the learning of the labelled samples. The model is designed based on the observation that weights converge to the central direction of each class in hyperspace. The TCP module dynamically adds an ad hoc weight matrix (called anchor) for each cluster in one mini-batch.
The proposed module for semi-supervised learning is quite novel, which is neither single-task nor multi-task learning, but with the labeled and unlabeld data trained simultaneously in a semi-supervised manner. It can be used both for unsupervised and semi-supervised learning.
Theoretical and empirical investigations are made regarding the observation on the direction of anchor gradually coinciding with the centroid as model converges. Evaluations were made by integrating the module in different learning settings, such as softmax loss, triplet loss, single task and multi-task learning. Ablation study is provided. Which seems technically and theoretically sound.
Adequate experiments are done by performing face recognition on IJBC and person re-identification tasks using six public benchmarks.
The paper is written with enough clarity and detail.
The paper lacks adequate literature review considering that it tackles both unsupervised and semi-supervised learning for CNN. Section 1.1. does not provide any detail on previous works in semi-supervised learning, which makes it difficult to judge the novelty of the work.
In Table 3, for single link learning, the method using real ground truth out performs TCP for Top-20 identification (98 and 96.9 ,respectively), but no explanation is provided regarding this.