* ICSE 2018 *
Sun 27 May - Sun 3 June 2018 Gothenburg, Sweden
Wed 30 May 2018 14:30 - 14:45 at E3 room - Programming and Code Analysis Chair(s): Thorsten Berger

Translating a program written in one programming language to another can be useful for software development tasks that need functionality implementations in different languages. Although past studies have considered this problem, they may be either specific to the language grammars, or specific to certain kinds of code elements (e.g., tokens, phrases, API uses). This paper proposes a new approach to automatically learn cross-language representations for various kinds of structural code elements that may be used for program translation. Our key idea is two folded: First, we normalize and enrich code token streams with additional structural and semantic information, and train cross-language vector representations for the tokens (a.k.a. shared embeddings based on word2vec, a neural-network-based technique for producing word embeddings; Second, hierarchically from bottom up, we construct shared embeddings for code elements of higher levels of granularity (e.g., expressions, statements, methods) from the embeddings for their constituents, and then build mappings among code elements across languages based on similarities among embeddings.

Our preliminary evaluations on about 40,000 Java and C# source files from 9 software projects show that our approach can automatically learn shared embeddings for various code elements in different languages and identify their cross-language mappings with reasonable Mean Average Precision scores. When compared with an existing tool for mapping library API methods, our approach identifies many more mappings accurately. The mapping results and code can be accessed at https://github.com/bdqnghi/hierarchical-programming-language-mapping. We believe that our idea for learning cross-language vector representations with code structural information can be a useful step towards automated program translation

Wed 30 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:30
Programming and Code AnalysisNIER - New Ideas and Emerging Results at E3 room
Chair(s): Thorsten Berger Chalmers University of Technology, Sweden
14:00
15m
Talk
Combining Spreadsheet Smells for Improved Fault Prediction
NIER - New Ideas and Emerging Results
DOI Pre-print File Attached
14:15
15m
Talk
Images of Code: Lossy Compression for Native Instructions
NIER - New Ideas and Emerging Results
Marcelino Rodriguez-Cancio , Benoit Baudry KTH Royal Institute of Technology, Sweden, Jules White Vanderbilt University
14:30
15m
Short-paper
Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code
NIER - New Ideas and Emerging Results
Nghi D. Q. Bui Singapore Management University, Singapore, Lingxiao Jiang Singapore Management University
Pre-print
14:45
15m
Talk
Which library should I use? A metric-based comparison of software libraries
NIER - New Ideas and Emerging Results
Fernando Lopez de La Mora University of Alberta, Sarah Nadi University of Alberta
Pre-print
15:00
15m
Talk
UniComp: a semantics-aware model compiler for optimised predictable software
NIER - New Ideas and Emerging Results
Federico Ciccozzi Malardalen University
Link to publication
15:15
15m
Talk
Self-adaptive static analysis
NIER - New Ideas and Emerging Results
Eric Bodden Heinz Nixdorf Institut, Paderborn University and Fraunhofer IEM
Pre-print