Positive-unlabeled (PU) learning deals with binary classification problems when only positive (P) and unlabeled (U) data are available. A lot of PU methods based on linear models and neural networks have been proposed; however, there is still a lack of study on boosting algorithms for PU learning, while a traditional boosting algorithm with simple base learners may perform better than neural networks. We propose a novel boosting algorithm for PU learning: Ada-PU, which compares against neural networks. Ada-PU follows the general procedure of AdaBoost, while P data are regarded as positive and negative simultaneously. Three distributions of PU data are maintained and updated in Ada-PU instead of one in the ordinary supervised (PN) learning. After a weak classifier is learned on the newly updated distribution, the corresponding weight of the classifier for the final ensemble is estimated using only PU data. We demonstrated that the proposed method is guaranteed to keep three theoretical properties of boosting algorithms with a defined set of base classifiers. In experiments, we showed that Ada-PU outperforms neural networks on benchmark PU datasets. We also study a real-world dataset UNSW-NB15 in cyber security and demonstrated that Ada-PU has superior performance for malicious activity detection.

Speaker Bio

Yawen Zhao received her Master of IT from the University of Queensland, Australia, in July this year. She is currently a PhD student under the supervision of Dr Miao Xu and Dr Nan Ye. Her research interests include boosting, weakly supervised learning, and PU learning.

About Data Science Seminar

This seminar series will be run as weekly sessions and is hosted by ITEE Data Science.


46-371 & via zoom