ML-powered features in software are notoriously challenging to test. As the behaviors of an ML feature emerge, so do user-level bugs. Traditional testing methods conducted during code development, or metrics collected during model development fail to provide an accurate view of such bugs. Moreover, online metrics collected during A/B experiments do not directly expose them. I will talk about how we are filling this gap by repurposing automated testing for ML software development and deployment. We leverage large volumes of system test cases, automatically obtained with their test oracles, to derive defect classes, which represent a partition of the input space on which the ML-based software does measurably worse. These defect classes are reported early during development, thereby enabling rapid improvement.
Program Display Configuration
Wed 2 Apr
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Viennachange