Are Your Requests Your True Needs? Checking Excessive Data Collection in VPA App
Virtual personal assistants (VPAs), such as Amazon Alexa, have become increasingly popular in recent years. This can be attributed largely to the proliferation of feature-rich and user-friendly applications (or VPA apps) from third-party developers. VPA apps may request access to the user’s personal data to realize their functionality, raising concerns on user privacy. While considerable efforts have been made to scrutinize VPA apps’ data collection behaviors against their declared privacy policies or requested permissions, it is often overlooked that most users tend to ignore these elements at the installation time. Dishonest developers thus can exploit this situation by embedding excessive declarations and requests to cover their data collection behaviors during compliance auditing.
In this work, we advocate the necessity of examining the app’s data collection against its functionality, to complement existing research on VPA app’s privacy compliance. We conduct a systematic analysis on the (in)consistency between the data needed by the app’s functionality and its actual requested data. To understand the app’s functionality topics, we analyze its available textual data (i.e., title, description and utterances), what are key resources that users typically refer to before deciding to use an app. We leverage advanced GPT-based language models to address the challenge in the VPA context that the documents of most apps are short and written in an unformatted manner. Based on the counterparts with similar functionality, suspicious data collection can be detected through the lens of anomaly detection.
We have developed our approach into Pico, a privacy inconsistency detector for Alexa skills, VPA apps of the most popular VPA platform. We apply it to understand the status quo of data-functionality compliance among all 65,195 skills in Alexa app store. Surprisingly, our study reveals that 21.7% of the analyzed skills exhibit suspicious data collection that is inconsistent with their functionality topics, including Top 10 popular Alexa skills that pose threats to 54,116 users. Our findings should raise an alert to both developers and users, in the compliance with the purpose limitation principle in data regulations. We specifically encourage the store operators to incorporate functionality-consistent data collection into their vetting process.
Thu 18 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Analytics 3Research Track / Journal-first Papers / Demonstrations at Maria Helena Vieira da Silva Chair(s): Sridhar Chimalakonda Indian Institute of Technology, Tirupati | ||
14:00 15mTalk | Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem Research Track Yun Peng The Chinese University of Hong Kong, Ruida Hu Harbin Institute of Technology, Shenzhen, Ruoke Wang Harbin Institute of Technology, Shenzhen, Cuiyun Gao Harbin Institute of Technology, Shuqing Li The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong | ||
14:15 15mTalk | Data-Driven Evidence-Based Syntactic Sugar Design Research Track David OBrien Iowa State University, Robert Dyer University of Nebraska-Lincoln, Tien N. Nguyen University of Texas at Dallas, Hridesh Rajan Iowa State University | ||
14:30 15mTalk | Revisiting Android App Categorization Research Track Marco Alecci University of Luxembourg, Jordan Samhi CISPA Helmholtz Center for Information Security, Tegawendé F. Bissyandé University of Luxembourg, Jacques Klein University of Luxembourg | ||
14:45 15mTalk | Are Your Requests Your True Needs? Checking Excessive Data Collection in VPA App Research Track Fuman Xie University of Queensland, Chuan Yan University of Queensland, Mark Huasong Meng National University of Singapore, Shaoming Teng The University of Queensland, Yanjun Zhang Deakin University, Guangdong Bai University of Queensland | ||
15:00 7mTalk | Acrobats and Safety-Nets: Problematizing Large-Scale Agile Software Development Journal-first Papers Knut Rolland University of Oslo, Brian Fitzgerald Lero - The Irish Software Research Centre and University of Limerick, Torgeir Dingsøyr Norwegian University of Science and Technology and SimulaMet, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital Link to publication DOI | ||
15:07 7mTalk | Program Transformation Landscapes for Automated Program Modification Using Gin: Extended Abstract Journal-first Papers Justyna Petke University College London, Brad Alexander University of Adelaide, Earl T. Barr University College London, Alexander E.I. Brownlee University of Stirling, Markus Wagner Monash University, Australia, David R. White University of Sheffield | ||
15:14 7mTalk | Boidae: Your Personal Mining Platform Demonstrations Brian Sigurdson Bowling Green State University, Samuel W. Flint University of Nebraska-Lincoln, Robert Dyer University of Nebraska-Lincoln Pre-print Media Attached | ||
15:21 7mTalk | Code Mapper: Mapping the Global Contributions of OSS Demonstrations Thomas Le Tourneau CY Tech, Jasmine Latendresse Concordia University, Ahmad Abdellatif University of Calgary, Emad Shihab Concordia University |