MOBILESoft 2025
Sun 27 Apr 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025

Security and privacy researchers uncover privacy compliance issues in Android apps by utilizing taint analysis, which necessitates specific lists of sensitive source and sink APIs (i.e., taint specification). Since manually crafting a comprehensive and updated taint specification across the tens of thousands of APIs in the Android framework is impractical, automatic taint specification generators have been developed. Recently, two novel approaches, CoDoC and DocFlow, have emerged. On the other hand, Google introduced the Google Play Data Safety Section (DSS), where app developers disclose their apps’ privacy practices. Since taint sources vary by data category, DSS-related taint sources (DSSTSs) are essential for researchers analyzing DSS compliance. Currently, no studies on automatic DSSTS identification or relevant evaluation datasets exist. In this paper, towards automatic DSSTS identification, we evaluate CoDoC and DocFlow in identifying DSSTSs. We collect taint sources referenced in prior privacy-related studies and official documentation and map them to 11 DSS data categories to create a dataset of 505 APIs. Using this dataset, we evaluate CoDoC and DocFlow, finding that they perform well in certain data categories, suggesting suitability for category-specific investigation. Additionally, we apply CoDoC and DocFlow to classify the entire Android framework, revealing that both identify a substantial number of taint sources, ranging from 21% to 57%. We also show that executing a taint analyzer with the generated taint source lists produces substantial leak detections, imposing high verification costs. Our findings highlight the limitations of existing generators in the identification of DSSTSs. We release the dataset to the community.