TCSE logo 
 Sigsoft logo
Sustainability badge

This program is tentative and subject to change.

Data clones are defined as multiple copies of the same data among datasets. Presence of data clones between datasets can cause issues such as difficulties in managing data assets and data license violations when using datasets with clones to build AI software. However, detecting data clones is not trivial. Majority of the prior studies in this area rely on structural information to detect data clones (e.g., font size, column header). However, tabular datasets used to build AI software are typically stored without any structural information. In this paper, we propose a novel method called SimClone for data clone detection in tabular datasets without relying on structural information. SimClone method utilizes value similarities for data clone detection. We also propose a visualization approach as a part of our SimClone method to help locate the exact position of the cloned data between a dataset pair. Our results show that our SimClone outperforms the current state-of-the-art method by at least 20% in terms of both F1-score and AUC. In addition, SimClone’s visualization component helps identify the exact location of the data clone in a dataset with a Precision@10 value of 0.80 in the top 20 true positive predictions.

This program is tentative and subject to change.

Wed 30 Apr

Displayed time zone: Eastern Time (US & Canada) change

13:30 - 14:00
13:30
30m
Poster
Pattern-based Generation and Adaptation of Quantum WorkflowsQuantum
Research Track
Martin Beisel Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Johanna Barzen University of Stuttgart, Frank Leymann University of Stuttgart, Lavinia Stiliadou Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Daniel Vietz University of Stuttgart, Benjamin Weder Institute of Architecture of Application Systems (IAAS), University of Stuttgart
13:30
30m
Talk
Mole: Efficient Crash Reproduction in Android Applications With Enforcing Necessary UI Events
Journal-first Papers
Maryam Masoudian Sharif University of Technology, Hong Kong University of Science and Technology (HKUST), Heqing Huang City University of Hong Kong, Morteza Amini Sharif University of Technology, Charles Zhang Hong Kong University of Science and Technology
13:30
30m
Talk
Automated Testing Linguistic Capabilities of NLP Models
Journal-first Papers
Jaeseong Lee The University of Texas at Dallas, Simin Chen University of Texas at Dallas, Austin Mordahl The University of Texas at Dallas, Cong Liu University of California, Riverside, Wei Yang UT Dallas, Shiyi Wei University of Texas at Dallas
13:30
30m
Poster
BSan: A Powerful Identifier-Based Hardware-Independent Memory Error Detector for COTS BinariesArtifact-FunctionalArtifact-Available
Research Track
Wen Zhang University of Georgia, Botang Xiao University of Georgia, Qingchen Kong University of Georgia, Le Guan University of Georgia, Wenwen Wang University of Georgia
13:30
30m
Talk
A Unit Proofing Framework for Code-level Verification: A Research AgendaFormal Methods
New Ideas and Emerging Results (NIER)
Paschal Amusuo Purdue University, Parth Vinod Patil Purdue University, Owen Cochell Michigan State University, Taylor Le Lievre Purdue University, James C. Davis Purdue University
Pre-print
13:30
30m
Talk
Listening to the Firehose: Sonifying Z3’s BehaviorArtifact-FunctionalArtifact-ReusableArtifact-AvailableFormal Methods
New Ideas and Emerging Results (NIER)
Finn Hackett University of British Columbia, Ivan Beschastnikh University of British Columbia
13:30
30m
Talk
Towards Early Warning and Migration of High-Risk Dormant Open-Source Software DependenciesSecurity
New Ideas and Emerging Results (NIER)
Zijie Huang Shanghai Key Laboratory of Computer Software Testing and Evaluation, Lizhi Cai Shanghai Key Laboratory of Computer Software Testing & Evaluating, Shanghai Software Center, Xuan Mao Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China, Kang Yang Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai Development Center of Computer Software Technology
13:30
30m
Poster
SimClone: Detecting Tabular Data Clones using Value Similarity
Journal-first Papers
Xu Yang University of Manitoba, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Dayi Lin Centre for Software Excellence, Huawei Canada, Shaowei Wang University of Manitoba, Zhen Ming (Jack) Jiang York University
13:30
30m
Talk
SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code GenerationFormal Methods
New Ideas and Emerging Results (NIER)
Junjie Sheng East China Normal University, Yanqiu Lin East China Normal University, Jiehao Wu East China Normal University, Yanhong Huang East China Normal University, Jianqi Shi East China Normal University, Min Zhang East China Normal University, Xiangfeng Wang East China Normal University

Thu 1 May

Displayed time zone: Eastern Time (US & Canada) change

15:30 - 16:00
15:30
30m
Talk
Mole: Efficient Crash Reproduction in Android Applications With Enforcing Necessary UI Events
Journal-first Papers
Maryam Masoudian Sharif University of Technology, Hong Kong University of Science and Technology (HKUST), Heqing Huang City University of Hong Kong, Morteza Amini Sharif University of Technology, Charles Zhang Hong Kong University of Science and Technology
15:30
30m
Talk
Best ends by the best means: ethical concerns in app reviews
Journal-first Papers
Neelam Tjikhoeri Vrije Universiteit Amsterdam, Lauren Olson Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam
15:30
30m
Talk
Shaken, Not Stirred. How Developers Like Their Amplified Tests
Journal-first Papers
Carolin Brandt Delft University of Technology, Ali Khatami Delft University of Technology, Mairieli Wessel Radboud University, Andy Zaidman Delft University of Technology
15:30
30m
Poster
BSan: A Powerful Identifier-Based Hardware-Independent Memory Error Detector for COTS BinariesArtifact-FunctionalArtifact-Available
Research Track
Wen Zhang University of Georgia, Botang Xiao University of Georgia, Qingchen Kong University of Georgia, Le Guan University of Georgia, Wenwen Wang University of Georgia
15:30
30m
Talk
Towards Early Warning and Migration of High-Risk Dormant Open-Source Software DependenciesSecurity
New Ideas and Emerging Results (NIER)
Zijie Huang Shanghai Key Laboratory of Computer Software Testing and Evaluation, Lizhi Cai Shanghai Key Laboratory of Computer Software Testing & Evaluating, Shanghai Software Center, Xuan Mao Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China, Kang Yang Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai Development Center of Computer Software Technology
15:30
30m
Talk
Exploring User Privacy Awareness on GitHub: An Empirical Study
Journal-first Papers
Costanza Alfieri Università degli Studi dell'Aquila, Juri Di Rocco University of L'Aquila, Paola Inverardi Gran Sasso Science Institute, Phuong T. Nguyen University of L’Aquila
15:30
30m
Poster
SimClone: Detecting Tabular Data Clones using Value Similarity
Journal-first Papers
Xu Yang University of Manitoba, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Dayi Lin Centre for Software Excellence, Huawei Canada, Shaowei Wang University of Manitoba, Zhen Ming (Jack) Jiang York University
15:30
30m
Talk
Strategies to Embed Human Values in Mobile Apps: What do End-Users and Practitioners Think?
SE in Society (SEIS)
Rifat Ara Shams CSIRO's Data61, Mojtaba Shahin RMIT University, Gillian Oliver Monash University, Jon Whittle CSIRO's Data61 and Monash University, Waqar Hussain Data61, CSIRO, Harsha Perera CSIRO's Data61, Arif Nurwidyantoro Universitas Gadjah Mada
:
:
:
: