Software isn’t created in one dramatic step. It improves bit by bit, one little step at a time — editing, running unit tests, fixing build errors, addressing code reviews, editing some more, appeasing linters, and fixing more errors — until finally, it becomes good enough to merge into a code repository. Software engineering isn’t an isolated process, but a dialogue among human developers, code reviewers, bug reporters, software architects and tools, such as compilers, unit tests, linters and static analyzers. I’ll talk about DIDACT(Dynamic Integrated Developer ACTivity), which is a methodology for training large machine learning (ML) models for software development. The novelty of DIDACT is that it uses the process of software development as the source of training data for the model, rather than just the polished end state of that process, the finished code. By exposing the model to the contexts that developers see as they work, paired with the actions they take in response, the model learns about the dynamics of software development and is more aligned with how developers spend their time. We leverage the instrumentation of Google’s software development to scale up the quantity and diversity of developer-activity data beyond previous works. Results are promising along two dimensions: usefulness to professional software developers, and as a potential basis for imbuing ML models with general software development skills.
Danny is a Staff Research Scientist at Google Brain, Adjunct Professor in Computer Science at McGill University, and a Core Industrial Member of the Mila Quebec AI Institute. His research interests are in the application of machine learning to problems involving structured data, with a specific interest in the intersection of machine learning, programming languages, and software engineering. His work has won paper awards at NeurIPS, UAI, and ICML & NeurIPS Workshops. He holds a Ph.D. from the Machine Learning group at the University of Toronto and was previously a Research Fellow at Darwin College, University of Cambridge and a Researcher at Microsoft Research Cambridge (UK).
Fri 31 MayDisplayed time zone: Eastern Time (US & Canada) change
09:15 - 10:30 | |||
09:15 75mKeynote | DIDACT: Large sequence models for software development activities Keynotes |