Write a Blog >>
ICST 2020
Sat 24 - Wed 28 October 2020 Porto, Portugal
Mon 26 Oct 2020 15:30 - 16:00 at Infante - RT8 - Misc 2 Chair(s): Alin Stefanescu
Tue 27 Oct 2020 02:30 - 03:00 at Infante - RT8 - Misc 2

A deterministic clustering algorithm is designed to always produce the same clustering solution on a given input. Therefore, users of clustering implementations (toolkits) naturally assume that implementations of a deterministic clustering algorithm A have deterministic behavior, that is: (1) two different implementations I 1 and I 2 of A are interchangeable, producing the same clustering on a given input D, and (2) an implementation produces the same clustering solution when run repeatedly on D. We challenge these assumptions. Specifically, we analyze clustering behavior on 528 datasets, three deterministic algorithms (Affinity Propagation, DBSCAN, Hierarchical Agglomerative Clustering) and the deterministic portion of a fourth (K-means), as implemented in various toolkits; in total, we examined 13 algorithm-toolkit combinations. We found that different implementations of deterministic clustering algorithms make different choices, e.g., default parameter settings, noise insertion, input dataset characteristics. As a result, clustering solutions for a fixed algorithm-dataset combination can differ across runs (nondeterminism) and across toolkits (inconsistency). We expose several root causes of such behavior. We show that remedying these root causes improves determinism, increases consistency, and can even improve efficiency. Our approach and findings can benefit developers, testers, and users of clustering algorithms.

Mon 26 Oct

Displayed time zone: Lisbon change

15:30 - 17:00
RT8 - Misc 2Journal-First Papers / Research Papers at Infante +11h
Chair(s): Alin Stefanescu University of Bucharest
Implementation-induced Inconsistency and Nondeterminism in Deterministic Clustering Algorithms
Research Papers
Xin Yin New Jersey Institute of Technology, Iulian Neamtiu New Jersey Institute of Technology, USA, Saketan Patil New Jersey Institute of Technology, Sean Andrews New Jersey Institute of Technology
Link to publication DOI
CBR: Controlled Burst Recording
Research Papers
Oscar Cornejo University of Milano Bicocca, Italy, Daniela Briola University of Milano Bicocca, Daniela Micucci University of Milano Bicocca, Leonardo Mariani University of Milano Bicocca
Link to publication DOI
Mahtab: Phase-wise acceleration of regression testing for C
Journal-First Papers
Shouvick Mondal IIT Madras, India, Rupesh Nasre IIT Madras, India
Link to publication DOI Media Attached

Tue 27 Oct

Displayed time zone: Lisbon change

02:30 - 04:00
Implementation-induced Inconsistency and Nondeterminism in Deterministic Clustering Algorithms
Research Papers
Xin Yin New Jersey Institute of Technology, Iulian Neamtiu New Jersey Institute of Technology, USA, Saketan Patil New Jersey Institute of Technology, Sean Andrews New Jersey Institute of Technology
Link to publication DOI
CBR: Controlled Burst Recording
Research Papers
Oscar Cornejo University of Milano Bicocca, Italy, Daniela Briola University of Milano Bicocca, Daniela Micucci University of Milano Bicocca, Leonardo Mariani University of Milano Bicocca
Link to publication DOI
Mahtab: Phase-wise acceleration of regression testing for C
Journal-First Papers
Shouvick Mondal IIT Madras, India, Rupesh Nasre IIT Madras, India
Link to publication DOI Media Attached