Lan-Zhe Guo, Yu-Feng Li. A General Formulation for Safely Exploiting Weakly Supervised Data. In: Proceedings of the 32nd AAAI conference on Artificial Intelligence (AAAI'18), New Orleans, LA. PDF
Weakly supervised data is an important machine learning data to help improve learning performance. However, recent results indicate that machine learning techniques with the usage of weakly supervised data may sometimes cause performance degradation. Safely leveraging weakly supervised data is important, whereas there is only very limited effort, especially on a general formulation to help provide insight to guide safe weakly supervised learning. In this paper we present a scheme that builds the final prediction results by integrating several weakly supervised learners. Our resultant formulation brings two advantages. i) For the commonly used convex loss functions in both regression and classification tasks, safeness guarantees exist under a mild condition; ii) Prior knowledge related to the weights of base learners can be embedded in a flexible manner. Moreover, the formulation can be addressed globally by simple convex quadratic or linear program efficiently. Experiments on multiple weakly supervised learning tasks such as label noise learning, domain adaptation and semi-supervised learning validate the effectiveness
Tong Wei, Lan-Zhe Guo, Yu-Feng Li, Wei Gao. Learning Safe Multi-Label Prediction for Weakly Labeled Data. Machine Learning, 2018, 107(4): 703-725. PDF
In this paper we study multi-label learning with weakly labeled data, i.e., labels of training examples are incomplete. This includes, e.g., (i) semi-supervised multi-label learning where completely labeled examples are partially known; (ii) weak label learning where relevant labels of examples are partially known; iii) extended weak label learning where relevant and irrelevant labels of examples are partially known. Weakly labeled data commonly occur in real applications, e.g., image classification, document categorization. Previous studies often expect that learning methods with the use of weakly labeled data improve learning performance, as more data are employed. This, however, is not always the cases in reality. Using more weakly labeled data may sometimes degenerate learning performance. It is desirable to learn safe multi-label prediction that will not hurt performance when weakly labeled data is used. In this work we optimize multi-label evaluation metrics (F1 score and Top-k precision) given that ground-truth label assignments are realized by a convex combination of basic multi-label learners. To cope with infinite number of possible ground-truth label assignments, cutting-plane strategy is adopted to iteratively generate the most helpful label assignments. The whole optimization is cast as a series of simple linear programs in an efficient manner. Extensive experiments on three weakly labeled learning tasks, namely, i) semi-supervised multi-label learning; ii) weak-label learning and iii) extended weak-label learning, show that our proposal clearly improves the safeness in comparison to many state-of-the-art methods.