Tutorials
Here are the ASE2020 tutorials and a brief description of each:
Tutorial 1
Machine Learning meets Software Performance: Optimization, Transfer Learning, and Counterfactual Causal Inference
Abstract
A wide range of modern software-intensive systems (e.g., autonomous systems, big data analytics, robotics, deep neural architectures) is built configurable. These highly-configurable systems offer a rich space for adaptation to different domains and tasks. Developers and users often need to reason about the performance of such systems, making tradeoffs to change specific quality attributes or detecting performance anomalies. For instance, the developers of image recognition mobile apps are not only interested in learning which deep neural architectures are accurate enough to classify their images correctly, but also which architectures consume the least power on the mobile devices on which they are deployed. Recent research has focused on models built from performance measurements obtained by instrumenting the system. However, the fundamental problem is that the learning techniques for building a reliable performance model do not scale well, simply because the configuration space of systems is exponentially large that is impossible to exhaustively explore. For example, it will take over 60 years to explore the whole configuration space of a system with 25 binary options.
In this tutorial, I will start motivating the configuration space explosion problem based on my previous experience with large-scale big data systems in the industry. I will then present transfer learning as well as other machine learning techniques including multi-objective Bayesian optimization to tackle the sample efficiency challenge: instead of taking the measurements from the real system, we learn the performance model using samples from cheap sources, such as simulators that approximate the performance of the real system, with a fair fidelity and at a low cost. Results show that despite the high cost of measurement on the real system, learning performance models can become surprisingly cheap as long as certain properties are reused across environments. In the second half of the talk, I will present empirical evidence, which lays a foundation for a theory explaining why and when transfer learning works by showing the similarities of performance behavior across environments. I will present observations of environmental changes' impacts (such as changes to hardware, workload, and software versions) for a selected set of configurable systems from different domains to identify the key elements that can be exploited for transfer learning. These observations demonstrate a promising path for building efficient, reliable, and dependable software systems as well as theoretically sound approaches for tackling performance optimization, testing, and debugging. Finally, I will share some promising and potential research directions including our recent progress on a performance debugging approach based on counterfactual causal inference.
Outline
- Background on computer system performance
- Case study: A composable highly-configurable system
- Performance analysis and optimization
- Transfer learning for performance analysis and optimization
- Research directions 1: Cost-aware multi-objective Bayesian optimization for MLSys
- Research directions 2: Counterfactual causal inference for performance debugging
Target audience
This tutorial is targeted for practitioners as well as researchers that would like to go deeper into understanding new and potentially powerful approaches for modern highly-configurable systems. This tutorial will be also suitable for students (both undergraduate and graduate) who want to learn about potential research directions and how they can find a niche and fruitful area in research at the intersections of machine learning, systems, and software engineering.
Bio
Pooyan Jamshidi is an Assistant Professor at the University of South Carolina. He directs the AISys Lab, where he investigates the development of novel algorithmic and theoretically principled methods for machine learning systems. Prior to his current position, he was a research associate at Carnegie Mellon University and Imperial College London, where he primarily worked on transfer learning for performance understanding of highly-configurable systems including robotics and big data systems. Pooyan's general research interests are at the intersection of systems/software and machine learning (MLSys). |
Tutorial 2
How to Conduct a Quick yet Thorough Literature Review: Saving Weeks of Effort With FASTREAD
Abstract
Literature reviews are essential before conducting research on any topic. A thorough literature review provides an overview of current knowledge, allowing researchers to identify relevant theories, methods, and gaps in the existing research. Yet a thorough literature review is always tedious and time-consuming, especially when a large portion of the time and effort is wasted on citation screening---identifying the dozens of relevant papers from hundreds to thousands of non-relevant search results. In this tutorial, we will demonstrate how FASTREAD, a machine learning tool (https://github.com/fastread/src), helps reduce the time and effort of citation screening by incrementally learning what the user is looking for and thus predicting which papers are more likely to be relevant. In the instructors' own experience of using this tool, it reduced 50 man-hours (94%) of the citation screening effort and included 90% of the relevant papers.
This tutorial features three sections each presenting
- the theory behind the FASTREAD tool
- a demonstration of how to set up and use the tool for citation screening
- an interactive section for the audiences to set up and use the tool themselves
Bio
Zhe Yu (Ph.D. NC State University, 2020) is an assistant professor in the Department of Software Engineering at Rochester Institute of Technology, where he teaches data mining and software engineering. His research explores the collaborations of human and machine learning algorithms that lead to better performance and higher efficiency. For more information, please visit http://azhe825.github.io/ . | |
Tim Menzies (IEEE Fellow) is a Professor in the Department of Computer Science at North Carolina State University. His research interests include software engineering (SE), data mining, artificial intelligence, search-based SE, and open access science. For more information, please visit http://menzies.us/ . |
Tutorial 3
Genetic improvement of software, search-based software engineering, automated program repair, non-functional properties
Abstract
Having roots in Search-Based Software Engineering (SBSE), the relatively young area of Genetic Improvement of Software (GI) aims to improve the properties of exisitng software. It can operate directly on source code, e.g., Java or C, and it typically starts with real-world software. This makes GI attractive for industrial applications, e.g. in contrast to Genetic Programming that aims to evolve applications from scratch. In this tutorial, we demonstrate how we can optimise with GI the physical properties of code such as power consumption, size of code, bandwidth, and other non-functional properties, including execution time.
Bio
Saemundur is a Lecturer at the University of Stirling. He has multiple publications on Genetic Improvement, including two that have received best paper awards; in 2017’s GI and ICTS4eHealth workshops. Additionally, he co-authored the first comprehensive survey on GI which was published in 2017. He has been invited to give multiple talks on the subject, including three Crest Open Workshops and for an industrial audience in Iceland. His PhD thesis (submitted in May 2017) details his work on the world's first live GI integration in an industrial application. Saemundur has previously given a tutorial on GI at PPSN 2018 and GECCO 2020. | |
John is a lecturer at QMUL. He has organized workshops at GECCO including Metaheuristic Design Patterns and ECADA, Evolutionary Computation for the Automated Design of Algorithms which has run for 7 years. He has also given tutorials on the same topic at PPSN, CEC, and GECCO. He currently holds a grant examining how Genetic Improvement techniques can be used to adapt scheduling software for airport runways. With his PhD Student, Saemundur Haraldsson (who this proposal is in collaboration with), won a best paper award at the 2017 GI workshop. He has also organized a GI workshop at UCL as part of their very successful Crest Open Workshops. | |
Markus is a Senior Lecturer at the School of Computer Science, University of Adelaide, Australia. His areas of interest are heuristic optimisation and applications thereof, and more specifically in theory-motivated algorithm design and in applications to wave energy production as well as to non-functional code optimisation. He held an Australian Research Council grant on dynamic adaptive software systems with a focus on mobile devices, and he has co-organised the GI@GECCO Workshop since 2018. He has worked on theoretical aspects of genetic programming, in particular on bloat-control mechanisms, and he is currently involved in the development of two open-source platforms that have genetic programming at their core, among them is GIN, which will be demonstrated in this tutorial. |
Tutorial 4
Creating Accessible Software Using Experiential Learning Labs
Abstract
Mature and robust software applications should posses several traits. Among others, they should be secure, provide the functionality desired by the stakeholder(s), be efficient and they should be accessible. Unfortunately, despite government legislation and demonstrated need, much of the software being developed today is not being created in an accessible manner. The objective of our Accessibility Learning Labs (ALL) is to both information participants about how to properly create accessible software, but importantly demonstrate the need to create accessible software. These experiential browser-based activities enable students, instructors and practitioners to utilize the material using only their browser. This tutorial will benefit a wide-range of participants in the software engineering community, ranging from students to experienced practitioners who want to ensure that they are properly creating inclusive, accessible software. Complete project material is publicly available on the project website: http://all.rit.edu
Bio
Daniel Krutz is the PI of the NSF-funded project (#1825023) that is devoted to creating the presented labs. Krutz has taught approximately ten different graduate and undergraduate software engineering courses and is the author of over fourteen pedagogical research papers. |
About
Mon 21 SepDisplayed time zone: (UTC) Coordinated Universal Time change
08:00 - 09:50 | |||
08:00 1h50mTutorial | Genetic improvement of software, search-based software engineering, automated program repair, non-functional properties Tutorials Saemundur Haraldsson University of Stirling, John R. Woodward , Markus Wagner University of Adelaide, Australia |
16:00 - 17:50 | |||
16:00 1h50mTutorial | Creating Accessible Software Using Experiential Learning Labs Tutorials Daniel Krutz Rochester Institute of Technology |
Fri 25 SepDisplayed time zone: (UTC) Coordinated Universal Time change
00:00 - 00:50 | |||
00:00 50mTutorial | How to Conduct a Quick yet Thorough Literature Review: Saving Weeks of Effort With FASTREAD Tutorials |
16:00 - 17:50 | |||
16:00 1h50mTutorial | Machine Learning meets Software Performance: Optimization, Transfer Learning, and Counterfactual Causal Inference Tutorials Pooyan Jamshidi University of South Carolina |
Call for proposals
Call For Papers
Tutorials address a wide range of mature topics from theoretical foundations to practical techniques and tools for automated software engineering. The general chair and organizers will decide the exact dates after all proposals have been reviewed and accepted. Tutorials are intended to provide scientific background on themes relevant to ASE’s research audience.
Instructors are invited to submit proposals for 1.5h, half-day (3h) and full-day (6h) tutorials and, upon selection, are required to provide tutorial notes on the topic of presentation in PDF. Tutorial proposals are limited to 5 pages.
SUBMISSION
Proposal submissions should include the following information:
- Name and affiliation of the proposer/organizer (including e-mail address)
- Name and affiliation of each additional instructor
- Instructors’ experience in the area, including other tutorials, courses, etc.
- Title, objective, abstract, and duration
- Outline with approximate timings
- Target audience, including the indication of level (novice, intermediate, expert)
- Assumed background of attendees
- Brief biography of each instructor (for later inclusion in publicity materials)
- History of the tutorial (if it has been already presented; provide location, approximate attendance, etc.)
- Audio-visual and technical requirements
- Preferences for tutorial date, duration (1.5h, half-day or full-day), and any other scheduling constraints, with justification for full day (if a full day is proposed)
Proposals are due by May 8, 2020 and should be submitted over HotCRP (https://ase2020-tutorials.hotcrp.com/). All submissions must be in PDF format and conform, at time of submission, to the ACM Proceedings Template (LaTEX users must use \documentclass[sigconf,review]{acmart})
.
Tutorial proposals will be reviewed by the ASE 2020 tutorial co-chairs Aldeida Aleti and Justyna Petke. Acceptance will be based on the timeliness and expected interest in the topic and the potential for attracting a sufficient number of participants. Note that tutorials with too few registered attendees may be cancelled.
Tutorials
Here are the ASE2020 tutorials and a brief description of each:
Tutorial 1
Machine Learning meets Software Performance: Optimization, Transfer Learning, and Counterfactual Causal Inference
Abstract
A wide range of modern software-intensive systems (e.g., autonomous systems, big data analytics, robotics, deep neural architectures) is built configurable. These highly-configurable systems offer a rich space for adaptation to different domains and tasks. Developers and users often need to reason about the performance of such systems, making tradeoffs to change specific quality attributes or detecting performance anomalies. For instance, the developers of image recognition mobile apps are not only interested in learning which deep neural architectures are accurate enough to classify their images correctly, but also which architectures consume the least power on the mobile devices on which they are deployed. Recent research has focused on models built from performance measurements obtained by instrumenting the system. However, the fundamental problem is that the learning techniques for building a reliable performance model do not scale well, simply because the configuration space of systems is exponentially large that is impossible to exhaustively explore. For example, it will take over 60 years to explore the whole configuration space of a system with 25 binary options.
In this tutorial, I will start motivating the configuration space explosion problem based on my previous experience with large-scale big data systems in the industry. I will then present transfer learning as well as other machine learning techniques including multi-objective Bayesian optimization to tackle the sample efficiency challenge: instead of taking the measurements from the real system, we learn the performance model using samples from cheap sources, such as simulators that approximate the performance of the real system, with a fair fidelity and at a low cost. Results show that despite the high cost of measurement on the real system, learning performance models can become surprisingly cheap as long as certain properties are reused across environments. In the second half of the talk, I will present empirical evidence, which lays a foundation for a theory explaining why and when transfer learning works by showing the similarities of performance behavior across environments. I will present observations of environmental changes' impacts (such as changes to hardware, workload, and software versions) for a selected set of configurable systems from different domains to identify the key elements that can be exploited for transfer learning. These observations demonstrate a promising path for building efficient, reliable, and dependable software systems as well as theoretically sound approaches for tackling performance optimization, testing, and debugging. Finally, I will share some promising and potential research directions including our recent progress on a performance debugging approach based on counterfactual causal inference.
Outline
- Background on computer system performance
- Case study: A composable highly-configurable system
- Performance analysis and optimization
- Transfer learning for performance analysis and optimization
- Research directions 1: Cost-aware multi-objective Bayesian optimization for MLSys
- Research directions 2: Counterfactual causal inference for performance debugging
Target audience
This tutorial is targeted for practitioners as well as researchers that would like to go deeper into understanding new and potentially powerful approaches for modern highly-configurable systems. This tutorial will be also suitable for students (both undergraduate and graduate) who want to learn about potential research directions and how they can find a niche and fruitful area in research at the intersections of machine learning, systems, and software engineering.
Bio
Pooyan Jamshidi is an Assistant Professor at the University of South Carolina. He directs the AISys Lab, where he investigates the development of novel algorithmic and theoretically principled methods for machine learning systems. Prior to his current position, he was a research associate at Carnegie Mellon University and Imperial College London, where he primarily worked on transfer learning for performance understanding of highly-configurable systems including robotics and big data systems. Pooyan's general research interests are at the intersection of systems/software and machine learning (MLSys). |
Tutorial 2
How to Conduct a Quick yet Thorough Literature Review: Saving Weeks of Effort With FASTREAD
Abstract
Literature reviews are essential before conducting research on any topic. A thorough literature review provides an overview of current knowledge, allowing researchers to identify relevant theories, methods, and gaps in the existing research. Yet a thorough literature review is always tedious and time-consuming, especially when a large portion of the time and effort is wasted on citation screening---identifying the dozens of relevant papers from hundreds to thousands of non-relevant search results. In this tutorial, we will demonstrate how FASTREAD, a machine learning tool (https://github.com/fastread/src), helps reduce the time and effort of citation screening by incrementally learning what the user is looking for and thus predicting which papers are more likely to be relevant. In the instructors' own experience of using this tool, it reduced 50 man-hours (94%) of the citation screening effort and included 90% of the relevant papers.
This tutorial features three sections each presenting
- the theory behind the FASTREAD tool
- a demonstration of how to set up and use the tool for citation screening
- an interactive section for the audiences to set up and use the tool themselves
Bio
Zhe Yu (Ph.D. NC State University, 2020) is an assistant professor in the Department of Software Engineering at Rochester Institute of Technology, where he teaches data mining and software engineering. His research explores the collaborations of human and machine learning algorithms that lead to better performance and higher efficiency. For more information, please visit http://azhe825.github.io/ . | |
Tim Menzies (IEEE Fellow) is a Professor in the Department of Computer Science at North Carolina State University. His research interests include software engineering (SE), data mining, artificial intelligence, search-based SE, and open access science. For more information, please visit http://menzies.us/ . |
Tutorial 3
Genetic improvement of software, search-based software engineering, automated program repair, non-functional properties
Abstract
Having roots in Search-Based Software Engineering (SBSE), the relatively young area of Genetic Improvement of Software (GI) aims to improve the properties of exisitng software. It can operate directly on source code, e.g., Java or C, and it typically starts with real-world software. This makes GI attractive for industrial applications, e.g. in contrast to Genetic Programming that aims to evolve applications from scratch. In this tutorial, we demonstrate how we can optimise with GI the physical properties of code such as power consumption, size of code, bandwidth, and other non-functional properties, including execution time.
Bio
Saemundur is a Lecturer at the University of Stirling. He has multiple publications on Genetic Improvement, including two that have received best paper awards; in 2017’s GI and ICTS4eHealth workshops. Additionally, he co-authored the first comprehensive survey on GI which was published in 2017. He has been invited to give multiple talks on the subject, including three Crest Open Workshops and for an industrial audience in Iceland. His PhD thesis (submitted in May 2017) details his work on the world's first live GI integration in an industrial application. Saemundur has previously given a tutorial on GI at PPSN 2018 and GECCO 2020. | |
John is a lecturer at QMUL. He has organized workshops at GECCO including Metaheuristic Design Patterns and ECADA, Evolutionary Computation for the Automated Design of Algorithms which has run for 7 years. He has also given tutorials on the same topic at PPSN, CEC, and GECCO. He currently holds a grant examining how Genetic Improvement techniques can be used to adapt scheduling software for airport runways. With his PhD Student, Saemundur Haraldsson (who this proposal is in collaboration with), won a best paper award at the 2017 GI workshop. He has also organized a GI workshop at UCL as part of their very successful Crest Open Workshops. | |
Markus is a Senior Lecturer at the School of Computer Science, University of Adelaide, Australia. His areas of interest are heuristic optimisation and applications thereof, and more specifically in theory-motivated algorithm design and in applications to wave energy production as well as to non-functional code optimisation. He held an Australian Research Council grant on dynamic adaptive software systems with a focus on mobile devices, and he has co-organised the GI@GECCO Workshop since 2018. He has worked on theoretical aspects of genetic programming, in particular on bloat-control mechanisms, and he is currently involved in the development of two open-source platforms that have genetic programming at their core, among them is GIN, which will be demonstrated in this tutorial. |
Tutorial 4
Creating Accessible Software Using Experiential Learning Labs
Abstract
Mature and robust software applications should posses several traits. Among others, they should be secure, provide the functionality desired by the stakeholder(s), be efficient and they should be accessible. Unfortunately, despite government legislation and demonstrated need, much of the software being developed today is not being created in an accessible manner. The objective of our Accessibility Learning Labs (ALL) is to both information participants about how to properly create accessible software, but importantly demonstrate the need to create accessible software. These experiential browser-based activities enable students, instructors and practitioners to utilize the material using only their browser. This tutorial will benefit a wide-range of participants in the software engineering community, ranging from students to experienced practitioners who want to ensure that they are properly creating inclusive, accessible software. Complete project material is publicly available on the project website: http://all.rit.edu
Bio
Daniel Krutz is the PI of the NSF-funded project (#1825023) that is devoted to creating the presented labs. Krutz has taught approximately ten different graduate and undergraduate software engineering courses and is the author of over fourteen pedagogical research papers. |