CAIN 2024
Sun 14 - Mon 15 April 2024 Lisbon, Portugal
co-located with ICSE 2024
Pedro Bizarro

Pedro Bizarro

Title - To have great machine learning models in production in harsh environments, first focus on the harsh environments

Abstract

Once a very large payment processor client told us: 'if we are down for 5 minutes, we open CNN - so don't screw up'. Processing billions of dollars per day, many of our clients need to continuously fight organized crime in the form of transaction fraud, stolen cards, anti-money laundering, account opening fraud, impersonations, scams, and many other exotic and ever-changing attacks. However, in addition to having very good detection rates and very low false positive rates, we still need to maintain very high availability rates, very low latencies, very high throughputs, automatic fault tolerance, auto scale up and down, and more. In this talk we cover that the trick for good models in harsh environments is focusing on the harsh environments first and only on the machine learning models next.

Bio

Pedro Bizarro is co-founder and Chief Science Officer of Feedzai where he leads the Research department. Drawing on a history in academia and research, Pedro helped to develop Feedzai’s industry-leading RiskOps platform to fight financial fraud using innovations from Research. Pedro is also an invited Visiting Professor at Universidade de Lisboa – IST, Member of the Global Innovator Programme at the World Economic Forum, has been an Assistant Professor at the University of Coimbra, and visiting professor at Carnegie Mellon University, a Fulbright Fellow, and holds a Computer Science PhD from the University of Wisconsin-Madison. Pedro’s main interests are high performance systems for data processing, machine learning, responsible AI, and data visualization. Pedro is also an avid runner and an Ironman.

Christian Kästner

Christian Kästner

Title : From Models to Systems: On the Role of Software Engineering for Machine Learning

Abstract

Building production systems with machine learning components is challenging and many projects fail when moving into production even when showing initial success with training machine-learned models. Unfortunately data science education focuses narrowly on data analysis, machine-learning algorithms, and model building but rarely engages with how the model may be used as part of a system. Engineering aspects beyond deploying models are often ignored or underappreciated, including requirements engineering, user experience design, planning and testing integration with non-ML components, and planning for evolution, leading to poor outcomes in many real-world projects. Software engineers and data scientists often clash in teams due to different goals, processes, and expectations, finding it hard to effectively coordinate and integrate work. In this talk, I argue for the important roles that software engineers have in machine learning projects that want to move beyond a prototype model. I argue that truly a system-wide perspective is needed if we want to have any hope at making meaningful progress on safety, usability, fairness, or security. I explore the common collaboration problems and discuss strategies to overcome them. This talk is a call for more and better education in this space at the intersection of software engineering and machine learning, as well as for more system-wide research on building software systems with machine-learning components.

Bio

Christian Kästner is an associate professor and the director of the Software Engineering PhD program at the School of Computer Science at Carnegie Mellon University. His research originally focused on software analysis and the boundaries of modularity, especially in the context of highly-configurable systems. He also conducts research on sustainability of open source software and communities. His research often used artificial intelligence and machine-learning tools, such as when predicting how configuration options change the performance of a software system or when modeling the benefit of donations in open source projects, though he personally never cared much about machine learning as a topic in itself (and even though unlikely, he wouldn’t mind another AI winter). Since 2019, he regularly co-teaches a new course “Machine Learning in Production” at the intersection of software engineering and machine learning to better prepare the large number of students who, after graduation, start to work on software systems that integrate more and more machine learning (e.g., mobile apps, web applications, IoT devices) -- and he has written a textbook on the topic. Since then, he also conducted research on collaboration, documentation, and quality assurance in teams where software engineers and data scientists interact.