Source code comes in different shapes and forms. Previous research has already shown code to be more predictable than natural language as well as highlighted its statistical predictability at the token level: source code can be natural. More recently, the structure of code — control flow, syntax graphs, abstract syntax trees etc.— has been successfully used to improve the state-of-the-art on numerous tasks: code suggestion, code summarisation, method naming etc. This body of work implicitly assumes that structured representations of code are similarly statistically predictable, i.e. that a structured view of code is also natural. We consider that this view should be made explicit and propose directly studying the Structured Naturalness Hypothesis. Beyond just naming existing research that assumes this hypothesis and formulating it, we also provide evidence in the case of trees: TreeLSTM models over ASTs for some languages, such as Ruby, are competitive with 𝑛-gram models while handling the syntax token issue highlighted by previous research ‘for free’. For other languages, such as Java or Python, we find tree models to perform worse, suggesting that downstream task improvement is uncorrelated to the language modelling task. Further, we show how such naturalness signals can be employed for near state-of-the-art results on just-in-time defect prediction while forgoing manual feature engineer
Fri 19 AprDisplayed time zone: Lisbon change
10:30 - 11:00 | |||
10:30 30mPoster | Exploring the Effectiveness of LLM based Test-driven Interactive Code Generation: User Study and Empirical Evaluation Posters Sarah Fakhoury Microsoft Research, Aaditya Naik University of Pennsylvania, Georgios Sakkas University of California at San Diego, Saikat Chakraborty Microsoft Research, Madan Musuvathi Microsoft Research, Shuvendu K. Lahiri Microsoft Research | ||
10:30 30mPoster | On the Need for Empirically Investigating Fast-Growing Programming Languages Posters Jahnavi Kumar Indian Institute of Technology Tirupati, India, Sridhar Chimalakonda Indian Institute of Technology, Tirupati | ||
10:30 30mPoster | Decoding Log Parsing Challenges: A Comprehensive Taxonomy for Actionable Solutions Posters Issam Sedki Concordia University, Wahab Hamou-Lhadj Concordia University, Montreal, Canada, Otmane Ait-Mohamed Concordia University, Naser Ezzati Jivan , Mohammed Shehab Concordia University | ||
10:30 30mPoster | Automated Code Editing with Search-Generate-Modify Posters Changshu Liu Columbia University, Pelin Cetin Columbia University, Yogesh Patodia Columbia University, Baishakhi Ray AWS AI Labs, Saikat Chakraborty Microsoft Research, Yangruibo Ding Columbia University | ||
10:30 30mPoster | Exploring the Impact of Inheritance on Test Code Maintainability Posters | ||
10:30 30mPoster | Improving Program Debloating with 1-DU Chain Minimality Posters Myeongsoo Kim Georgia Institute of Technology, Santosh Pande Georgia Institute of Technology, Alessandro Orso Georgia Institute of Technology Pre-print | ||
10:30 30mPoster | GoSpeechLess: Interoperable Serverless ML-based Cloud Services Posters Sashko Ristov University of Innsbruck, Philipp Gritsch University of Innsbruck, David Meyer University of Innsbruck, Michael Felderer German Aerospace Center (DLR) & University of Cologne | ||
10:30 30mPoster | Towards Precise Observations of Neural Model Robustness in Classification Posters Wenchuan Mu ISTD, Singapore University of Technology and Design, Kwan Hui Lim Singapore University of Technology and Design, Singapore | ||
10:30 30mPoster | Assessing AI-Based Code Assistants in Method Generation Tasks Posters Vincenzo Corso University of Milano - Bicocca, Leonardo Mariani University of Milano-Bicocca, Daniela Micucci University of Milano-Bicocca, Italy, Oliviero Riganelli University of Milano - Bicocca | ||
10:30 30mPoster | Recovering Traceability Links between Release Notes and Related Software Artifacts Posters | ||
10:30 30mPoster | Improving the Condensing of Reverse Engineered Class Diagrams using Weighted Network Metrics Posters Weifeng Pan Zhejiang Gongshang University, Wei Wu Zhejiang Gongshang University, Hua Ming Oakland University, Dae-Kyoo Kim Oakland University, Jinkai Yang Oakland University, Ruochen Liu Oakland University Media Attached | ||
10:30 30mPoster | Exploring Data Cleanness in Defects4J and Its Influence on Fault Localization Efficiency Posters Md Nakhla Rafi Concordia University, An Ran Chen University of Alberta, Tse-Hsun (Peter) Chen Concordia University, Shaohua Wang Central University of Finance and Economics | ||
10:30 30mPoster | Learning to Represent Patches Posters Xunzhu Tang University of Luxembourg, Haoye Tian University of Melbourne, Zhenghan Chen Peking University, Weiguo Pian University of Luxembourg, Saad Ezzini Lancaster University, Abdoul Kader Kaboré University of Luxembourg, Andrew Habib ABB Corporate Research, Germany, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg | ||
10:30 30mPoster | Bringing Structure to Naturalness: On the Naturalness of ASTs Posters Profir-Petru Pârțachi National Institute of Informatics, Japan, Mahito Sugiyama National Institute of Informatics, Japan |