CAREER: Context-Aware API Recognition with Data Augmentation for API Knowledge ExtractionICPCICPC Full paperVirtual-Talk
The recognition of Application Programming Interface (API) mentions in the software-related texts is a prerequisite task for extracting API-related knowledge. Previous studies have demonstrated the superiority of deep learning-based methods in accomplishing this task. However, such techniques still meet their bottlenecks due to their inability to effectively handle the following three challenges: (1) differentiating APIs from common words; (2) identifying APIs in morphological variants of the standard APIs; and (3) the lack of high-quality labeled data for training. To overcome these challenges, this paper proposes a context-aware API recognition method named CAREER. This approach utilizes two key components, namely Bidirectional Encoder Representations from Transformers (BERT) and Bi-directional Long Short-Term Memory (BiLSTM), to extract context information at both the word-level and sequence-level. This strategic combination empowers the method to dynamically capture both syntactic and semantic information, effectively addressing the first challenge. To tackle the second challenge, CAREER introduces a character-level BiLSTM component, enriched with an attention mechanism. This enables the model to grasp character-level global context information, thereby enhancing the recognition of morphological attributes within API mentions. Furthermore, to address the third challenge, the paper introduces three data augmentation techniques aimed at generating new data samples. Accompanying these techniques is a novel sample selection algorithm designed to screen out high-quality instances. This dual-pronged approach effectively mitigates the requirement for data labeling. Experiments demonstrate that CAREER significantly improves F1-score by 11.0% compared with state-of-the-art methods. We also construct specific datasets to assess CAREER’s capacity to tackle the aforementioned challenges. Results confirm that (1) CAREER significantly outperforms baseline methods in addressing the first and second challenges, and (2) with the aid of data augmentation techniques and sample selection algorithms, high-quality samples can be generated to improve the performance, and effectively alleviate the third challenge.
Tue 16 AprDisplayed time zone: Lisbon change
16:00 - 17:30 | Code Analysis and Mining StudiesTool Demonstration / Research Track / at Sophia de Mello Breyner Andresen Chair(s): DongGyun Han Royal Holloway, University of London | ||
16:00 10mTalk | ASKDetector: An AST-Semantic and Key Features Fusion based Code Comment Mismatch DetectorICPCICPC Full paperVirtual-Talk Research Track Haiyang Yang School of Computer Science and Engineering, Central South University, hao chen , Zhirui Kuai School of Computer Science and Engineering, Central South University, Shuyuan Tu School of Computer Science and Engineering, Central South University, Li Kuang School of Computer Science and Engineering, Central South University | ||
16:10 10mTalk | TaiE: Function Identification for Monolithic FirmwareICPCICPC Full paper Research Track Jintao Huang Institute of Information Engineering, Chinese Academy of Science & University of Chinese Academy of Sciences, Beijing, China, Kai Yang School of Computer, Electronics and Information, Guangxi University, Gaosheng Wang Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China, Zhiqiang Shi Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,, Shichao Lv Institute of Information Engineering, Chinese Academy of Science, Limin Sun Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences, | ||
16:20 10mTalk | Rationale Dataset and Analysis for the Commit Messages of the Linux Kernel Out-of-Memory KillerICPCICPC Full paper Research Track Mouna Dhaouadi University of Montreal, Bentley Oakes Polytechnique Montréal, Michalis Famelis Université de Montréal | ||
16:30 10mTalk | Lightweight Syntactic API Usage Analysis with UCovICPCICPC Full paper Research Track Gustave Monce Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, Thomas Couturou Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, Yasmine Hamdaoui Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, Thomas Degueule CNRS, Jean-Rémy Falleri Bordeaux INP Pre-print | ||
16:40 10mTalk | CAREER: Context-Aware API Recognition with Data Augmentation for API Knowledge ExtractionICPCICPC Full paperVirtual-Talk Research Track Zhang Zhang , Xinjun Mao National University of Defense Technology, Shangwen Wang National University of Defense Technology, Kang Yang National University of Defense Technology, Yao Lu National University of Defense Technology | ||
16:50 8mTalk | TerraMetrics: An Open Source Tool for Infrastructure-as-Code (IaC) Quality Metrics in TerraformICPCICPC Tools Tool Demonstration | ||
16:58 8mTalk | OpenGalaxy: An interactive exploration platform for a visualized GitHub Full Domain collaboration networkICPCICPC Tools Tool Demonstration Xinran Zhang , Shengyu Zhao Tongji University, Yenan Tang East China Normal University, Xiaoya Xia East China Normal University, will wang | ||
17:06 8mTalk | Hypercrx: A browser extension for insights into GitHub projects and developersICPCICPC Tools Tool Demonstration Yenan Tang East China Normal University, Shengyu Zhao Tongji University, Xiaoya Xia East China Normal University, Fenglin Bi East China Normal University, will wang | ||
17:14 16mTalk | Code Analysis and Mining Studies: Panel with SpeakersICPC Discussion |