DeepMemory: Model-based Memorization Analysis of Deep Neural Language Models
The neural network model is having a significant impact on many real-world applications. However, the ever increasing popularity and complexity of these models amplifies their security and privacy challenges, with privacy leakage from training data being one of the most prominent issues. In this context, prior studies proposed to analyze the abstraction behavior of neural network models, e.g., \emph{RNN}, to understand their robustness. However, the existing research rarely addresses privacy breach memorization in neural language models. To fill this gap, we propose a novel approach, \emph{DeepMemory}, that analyzes memorization behavior for a neural language model. We first construct a memorization-analysis oriented model, taking both training data and a neural language model as input. We then build a semantic first-order Markov model to bind the constructed semantic memorization-analysis oriented model to the training data to analyze memorization distribution. Finally, we apply our approach to address data leakage issues associated with memorization and to assist with dememorization. We evaluate our approach on one of the most popular neural language models, the \emph{LSTM}-based language model, with three public datasets, namely, WikiText-103, WMT2017, and IWSLT2016. We find that sentences in the studied datasets with low perplexity are more likely to be memorized. Our approach achieves an average AUC of 0.73 in automatically identifying data leakage issues during assessment. Finally, with the assistance from our approach, the memorization risk from the neural language model can be mitigated by mutating training data without impacting the quality of neural language models.
Thu 18 NovDisplayed time zone: Hobart change
22:00 - 23:00 | |||
22:00 20mTalk | Modeling Team Dynamics for the Characterization and Prediction of Delays in User Stories Research Papers Elvan Kula Delft University of Technology, Arie van Deursen Delft University of Technology, Netherlands, Georgios Gousios Facebook & Delft University of Technology Pre-print | ||
22:20 20mTalk | DeepMemory: Model-based Memorization Analysis of Deep Neural Language Models Research Papers Derui Zhu Technical University of Munich, Jinfu Chen Centre for Software Excellence, Huawei, Canada, Weiyi Shang Concordia University, Xuebing Zhou Huawei Munich Research Center, Jens Grossklags Technical University of Munich, Ahmed E. Hassan Queen's University | ||
22:40 20mTalk | Automated Verification of Go Programs via Bounded Model Checking Research Papers Pre-print |