Recent advances in code specialized language models have transformed software development by assisting developers for tasks, e.g., code generation, debugging, or testing. However, training these models on vast, centralized datasets raises significant privacy concerns, as inadvertent memorization of sensitive code can lead to data breaches and intellectual property risks. This research investigates the underlying patterns of code memorization and explores privacy preserving techniques, such as Federated Learning with Differential Privacy and targeted noise injection, to mitigate these risks. The goal is to develop robust and secure code generation systems that maintain high performance while safeguarding sensitive information, ultimately fostering greater trust and wider adoption in modern Software Engineering.
Program Display Configuration
Wed 25 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Viennachange