Data owners have the right to request for deleting their data from a machine learning (ML) model. In response, a naïve way is to retrain the model with the original dataset excluding the data to forget, which is however unrealistic as the required dataset may no longer be available and the retraining process is usually computationally expensive. To cope with this reality, machine unlearning has recently attained much attention, which aims to enable data removal from a trained ML model responding to deletion requests, without retraining the model from scratch or full access to the original training dataset. Existing unlearning methods mainly focus on handling conventional ML methods, while unlearning deep neural networks (DNNs) based models remains underexplored, especially for the ones trained on large-scale datasets.


In this work, we make the first attempt to realize data forgetting on deep models for image retrieval. Image retrieval targets at searching relevant data to the query according to similarity measures. Intuitively, unlearning a deep image retrieval model can be achieved by breaking down its ability of similarity modeling on the data to forget. To this end, we propose a generative scrubbing (GS) method that learns a generator to craft noisy data to manipulate the model weights. A novel framework is designed consisting of the generator and the target retrieval model, where a pair of coupled static and dynamic learning procedures are performed simultaneously. This novel learning strategy effectively enables the generated noisy data to fade away the memory of the model on the data to forget whilst retaining the information of the remaining data. Extensive experiments on three widely-used datasets have successfully verified the effectiveness of the proposed method.

Speaker Bio:

Pengfei Zhang is a Ph.D. candidate from the School of ITEE at the University of Queensland under the supervision of Prof Helen Huang. He received his degree of Master in Computer Science from the Shandong University, China. His research interests include privacy protection, robust learning, multi-model learning and information retrieval.

About Data Science Seminar

This seminar series will be run as weekly sessions and is hosted by ITEE Data Science.

46-401 and via Zoom