Dissecting APKs from Google Play: Trends, Insights and Security Implications
Researchers generally look for specific files within Android application packages (APKs) during their analysis, focusing on common files such as Dalvik bytecode or the Android manifest. However, Android apps are complex archive files containing various types of files. Failing to account for all files during analyses can compromise end-user security, and despite the wealth of existing techniques to analyze Android apps, only a few studies explore the diversity of files within apps. To bridge this gap, we propose the first large-scale empirical study that dissects the content of Android apps from Google Play. In our study, we explore the different file types and their usage trends. We enhance our analysis by exploring compressed files and the files they contain. We finally investigate to which extent developers use disguised files, i.e., files whose extension is conventionally associated with a file type different than its own (e.g., a Dalvik dex file with the extension “.png”), and study if they are a hint of maliciousness. Our results show that: ❶ Android apps comprise diverse file types, with over 15 000 distinct file extensions and more than 1000 unique file types found in our dataset containing over 400 000 APKs; and ❷ we found many cases where developers use a wrong relation between the file type and its extension to load malicious code at runtime.